March 4, 2026

Harmony Thrive

Superior Health, Meaningful Life

Patient and nodule characteristics associated with adherence to lung cancer screening in a large integrated healthcare system

Patient and nodule characteristics associated with adherence to lung cancer screening in a large integrated healthcare system

Data source and study population

We obtained 2012–2021 patient-level EHR data from the University of Florida (UF) Health Integrated Data Repository (IDR), a clinical data warehouse aggregating patient information from UF’s various clinical and administrative systems, including the Epic EHR system. The IDR contains more than one billion observational data elements from more than two million patients, encompassing structured data such as patient demographics, diagnoses, medical procedures, vital signs, laboratory tests, and medications, as well as unstructured clinical narratives such as discharge summaries, order notes, and pathology reports. This study was approved by the UF Institutional Review Board (IRB). All methods were performed in accordance with relevant guidelines and regulations.

The UF Health lung cancer screening program was implemented in 2014, shortly after the USPSTF recommendations for LDCT screening were established. The program adheres to national guidelines, which are updated in accordance with USPSTF revisions. Additionally, the Lung-RADS classification system, introduced by the American College of Radiology in 2014, was adopted early by the UF Health lung cancer screening program and has been used consistently to guide follow-up recommendations. Patients were typically referred for lung cancer screening by their primary care providers or pulmonary physicians, who assess eligibility based on guideline criteria. LDCT results were communicated to patients through the electronic medical record system, where complete radiology reports were accessible. However, there was no standardized institutional protocol for communicating results. As such, communication practices varied by providers—ranging from brief summaries of the Lung-RADS category and recommended follow-up to detailed discussions of specific nodule findings. This variability may have influence patients’ understanding of their risk and their adherence to follow-up recommendations.

We identified patients who underwent at least one LDCT procedure between October 1, 2014 and October 31, 2021 in UF Health IDR data using Current Procedural Terminology (CPT) codes based on their effective date range (S8032, effective from October 1, 2014-September 30, 2016; G0297, effective from February 5, 2015-December 31, 2020; and 71271, effective from January 1, 2021 onwards). For each patient, the date of the first LDCT was defined as the index date. We excluded patients: (1) who did not qualify for LDCT screening (i.e., were not current or former smokers, or whose age at the initial LDCT did not meet the USPSTF eligibility criteria—age 55–80 per the 2013 guideline if before March 2021, and age 50–80 per the 2021 guideline if on or after March 2021); (2) who had no encounter records within one year before the index date, to ensure sufficient prior data for measuring baseline characteristics; (3) whose follow-up period (from the index date to their last EHR visit) was shorter than the Lung-RADS recommended follow-up time minus 3 months; (4) whose follow-up period (from the index date to the study end date, October 31, 2021) was shorter than the Lung-RADS recommended follow-up time plus 3 months; (5) who could not be adherent due to death, a lung cancer diagnosis, or being order than 80 years old during the follow-up period; (6) who had received a non-screening chest CT scan within the maximum follow-up window, as these scans could preclude adherence to Lung-RADS-defined follow-up LDCT protocols and lead to misclassification of adherence status.

Due to data limitations, pack-year history and time since quitting smoking were unavailable, therefore, eligibility for LDCT screening was determined based on age and smoking status alone.

Study outcome

The primary outcome was whether a patient who had received an initial LDCT was adherent to Lung-RADS recommended follow-up schedule for LDCT. Specifically, the Lung-RADS recommended follow-up interval is 12 months for categories 1 (i.e., negative) and 2 (i.e., benign appearance or behavior), 6 months for category 3 (i.e., probably benign), and 3 months for category 4 A (i.e., suspicious). For Lung-RADS categories 4B and 4X (i.e., highly suspicious), immediate chest CT or PET/CT with or without biopsy is recommended, but no standard follow-up is prescribed29. We included patients whose initial LDCT was in Lung-RADS categories 1, 2, 3, and 4 A which involve standard follow-up rather than immediate interventions. Lung-RADS categories for the initial LDCT were extracted from lung cancer screening order narratives using our previously developed rule-based approach30. Lung-RADS categories are often documented in radiology reports with specific patterns, including numbers and letters (e.g., “Lung-RADS category: 4A”). Our rule-based approach, using regular expressions to capture these patterns, achieved an F1-score of 0.998. Being adherent to follow-up LDCT was defined as undergoing the second LDCT within ± 3 months of the recommended follow-up time interval after the initial LDCT.

Predictors of interest

The predictors of interest included socio-demographic, clinical and pulmonary nodule characteristics. The socio-demographic characteristics included age at index date, sex, race-ethnicity, census tract-level rurality and poverty, smoking status, insurance of primary payer for the initial LDCT, baseline healthcare utilization, and marital status, whereas the clinical characteristics included family cancer history, baseline chronic pulmonary disease (COPD) status, and Charlson comorbidity index (CCI)31. Census tract-level rurality was determined by linking patient’s latest zip-code in the EHRs to the Rural-Urban Commuting Area (RUCA) codes32 and categorizing patients as urban (RUCA code 1) or non-urban (RUCA code 2–10) residents. Census tract-level poverty, defined as the percentage of the population below poverty line, was determined by linking patients’ latest zip-codes to the Census Bureau’s American Community Survey and categorizing them into 3 groups: < 10%, 10%−19%, ≥ 20%. Smoking status (i.e., current or former smoker) and marital status (i.e., married/partnered, single, or other) were determined using the most recent EHR status before the index date. Insurance of primary payer for the initial LDCT was categorized as Medicare, commercial, Medicaid or other (e.g., charity, worker’s compensation, managed care, federal/state/local government insurance, self-pay). Baseline healthcare utilization was measured using the numbers of outpatient and inpatient visits within one year prior to the index date. Family history of all cancer (ICD-9: V16; ICD-10: Z80) was extracted from structured EHR data prior to the index date. Additionally, baseline COPD (ICD-9: 490–496; ICD-10: J40-J44) and CCI were extracted from EHR data within 12 months prior to the index date. We calculated the CCI following the modified algorithm by Klabunde et al.31. CCI was categorizing into 3 groups: no comorbidity (CCI = 0), some comorbidities (CCI = 1), a substantial burden of comorbidities (CCI ≥ 2).

Pulmonary nodule characteristics included Lung-RADS categories (extracted using rule-based algorithms mentioned previously) and nodule characteristics, both extracted using NLP from unstructured EHR data. Five categories of nodule characteristics were extracted from clinical notes and included in this study as predictor of adherence to follow-up LDCT: the number of the nodules, the largest nodule size (0, < 6 mm, 6–8 mm, > 8 mm), nodule texture (calcified, ground glass, noncalcified, soft, solid, other), laterality (left, right, bilateral, other), site (lower, middle, upper, other). The pulmonary nodules and associated nodule characteristics were extracted from radiology reports using NLP system with state-of-the-art transformer models, which we developed and validated previously using UF Health EHRs30. The NLP system integrated the robustly optimized BERT approach (RoBERTa)-mimic model for concept extraction, A Lite BERT (ALBERT)-base model for the relation identification, and the RoBERTa-mimic model for negation detection. Our end-to-end NLP system for extracting pulmonary nodule and nodule characteristics achieved an excellent F1-score of 0.8869 (precision = 0.8345 and recall = 0.9464).

Statistical analysis

We calculated summary statistics to describe the study characteristics in the overall population and by Lung-RADS category. Continuous variables were presented as means with standard deviations for those following a normal distribution or as medians with interquartile ranges (25th and 75th percentiles) for those that were skewed. Categorical variables were summarized using frequencies and percentages. Normality of continuous data was assessed using the Kolmogorov-Smirnov test. Differences in study characteristics across Lung-RADS categories were evaluated using analysis of variance (ANOVA) or the Kruskal-Wallis test for continuous variables, and the Chi-squared or Fisher’s exact test for categorical variables. For variables with missing values, we created an “unknown” category and included it in both univariate comparisons and in the regression models to retain the full analytic sample. Other variables had no missing values. We built univariable and multivariable logistics regression models to examine the factors associated adherence to screening. Separate models were built for patients in Lung-RADS category 1 and those in categories 2–4 A because over 90% of the patients in category 1 had no nodules. Pulmonary nodule characteristics were used as predictors in the model for patients in Lung-RADS categories 2–4 A only. To assess whether the associations between nodule characteristics and adherence differed by Lung-RADS category (2–4 A), we tested interaction terms between each nodule characteristic and Lung-RADS category. All effects were estimated as odds ratios (ORs) with 95% confidence intervals (CIs). Two-sided p-values were calculated for all statistics, considering a significance level of 0.05. Data processing and management were conducted using python 3.9.4. Statistical analyses were conducted using SAS 9.4 (SAS Institute Inc., Cary, NC, USA).

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.