1. Doe and colleagues assessed an AI software’s ability to improve spirometry interpretation performance by primary care clinicians.
2. The AI software group demonstrated greater diagnosis prediction performance compared to the control group.
Evidence Rating Level: 1 (Excellent)
Study Rundown: Spirometry is widely used for diagnosing asthma and chronic obstructive pulmonary disease (COPD). However, its interpretation is highly variable in primary care, leading to underdiagnosis and misdiagnosis of chronic respiratory diseases. Doe and colleagues evaluated whether AI decision support software can improve spirometry interpretation performance among primary care clinicians. Participants were general practitioners (GPs) or practice nurses who routinely interpret spirometry results, and they were randomly assigned 1:1 to either the control group or intervention group, with only the latter having access to AI spirometry interpretation reports. The primary outcome was the preferred diagnosis prediction performance. The study found a statistically higher preferred diagnosis prediction performance in the AI intervention group versus the control group (58.7% vs. 49.7%). Additionally, the AI intervention group showed greater correct grading of the technical quality of forced expiratory volume 1 second (FEV1) and forced vital capacity (FVC). This study demonstrated that the addition of AI decision support spirometry software significantly improves the diagnosis prediction performance of primary care clinicians.
Click here to read the study in NEJM AI
Relevant Reading: Artificial Intelligence for Spirometry Quality Evaluation: A Systematic Review
In-Depth [randomized controlled trial]: All participants interpreted 50 real-world patient spirometry records and were given a brief history, basic demographic data, and the spirometry report. Participants in the intervention group also received an AI decision-support software report. The primary endpoint was the preferred diagnosis prediction performance, defined as the percentage of correct diagnoses. Secondary endpoints included the correct grading of the technical quality of FEV1 and FVC, and self-rated diagnostic confidence. 119 clinicians were assigned to the control group, and 115 were assigned to the intervention group. However, only 66 in the control group and 67 in the intervention group completed the study. Participants in the intervention group achieved a higher preferred diagnosis prediction performance (mean ± standard deviation (SD), 58.7 ± 7.0% vs. 49.7 ± 16.6%). The mean difference between the two groups was 9.0 (95% confidence interval (CI), 4.5-13.3%, p = 0.001). In terms of secondary outcomes, the mean difference between the intervention and control groups for the correct grading of technical quality of FEV1 and FVC was 5.0 (95% CI, 3.1-7.0) and 10.8 (95% CI, 7.5-13.9). There was no statistically significant difference between the two groups in self-confidence in diagnosis. This study was limited by a higher-than-expected non-completion rate and a lack of an established clinically relevant minimum important diagnostic performance difference between the two groups. Nonetheless, this study provided promising evidence that AI software can be used to improve spirometry performance in primary care settings.
Image: PD
©2026 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc.