Extracting forced vital capacity from the electronic health record through natural language processing in rheumatoid arthritis-associated interstitial lung disease

Bryant R. England, Punyasha Roul, Yangyuna Yang, Daniel Hershberger, Harlan Sayles, Jorge Rojas, Grant W. Cannon, Brian C. Sauer, Jeffrey R. Curtis, Joshua F. Baker, Ted R. Mikuls

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Purpose: To develop a natural language processing (NLP) tool to extract forced vital capacity (FVC) values from electronic health record (EHR) notes in patients with rheumatoid arthritis-interstitial lung disease (RA-ILD). Methods: We selected RA-ILD patients (n = 7485) in the Veterans Health Administration (VA) between 2000 and 2020 using validated ICD-9/10 codes. We identified numeric values in proximity to FVC string patterns from clinical notes in the EHR. Subsequently, we performed processing steps to account for variability in note structure, related pulmonary function test (PFT) output, and values copied across notes, then assigned dates from linked administrative procedure records. NLP-derived FVC values were compared to values recorded directly from PFT equipment available on a subset of patients. Results: We identified 5911 FVC values (n = 1844 patients) from PFT equipment and 15 383 values (n = 4982 patients) by NLP. Among 2610 date-matched FVC values from NLP and PFT equipment, 95.8% of values were within 5% predicted. The mean (SD) difference was 0.09% (5.9), and values strongly correlated (r = 0.94, p < 0.001), with a precision of 0.87 (95% CI 0.86, 0.88). NLP captured more patients with longitudinal FVC values (n = 3069 vs. n = 1164). Mean (SD) change in FVC %-predicted per year was similar between sources (−1.5 [30.0] NLP vs. −0.9 [16.6] PFT equipment; standardized response mean = 0.05 for both). Conclusions: NLP of EHR notes increases the capture of accurate, longitudinal FVC values by three-fold over PFT equipment. Use of this NLP tool can facilitate pharmacoepidemiologic research in RA-ILD and other lung diseases by capturing this critical measure of disease severity.

Original languageEnglish (US)
Article numbere5744
JournalPharmacoepidemiology and Drug Safety
Volume33
Issue number1
DOIs
StatePublished - Jan 2024

Keywords

  • electronic health record
  • forced vital capacity
  • interstitial lung disease
  • natural language processing
  • pulmonary function test
  • rheumatoid arthritis

ASJC Scopus subject areas

  • Epidemiology
  • Pharmacology (medical)

Fingerprint

Dive into the research topics of 'Extracting forced vital capacity from the electronic health record through natural language processing in rheumatoid arthritis-associated interstitial lung disease'. Together they form a unique fingerprint.

Cite this