1. Anderson and colleagues evaluated clinical staff’s response time to patient-sent messages with NLP labelling against that of staff without NLP use.
2. NLP reduced the initial time needed to address a new patient message and complete a patient conversation.
Evidence Rating Level: 2 (Good)
Study Rundown: Patients have increasingly used EHR messaging portals for care, but these messages are often initially directed to a central pool before being repeatedly manually directed to the appropriate staff. Anderson and colleagues developed an NLP to label inbound messages into commonly encountered themes to reduce message response time. An NLP model was trained using 40,000 EHR messages and labelled the messages into five categories: urgent, clinician, refill, schedule, or form. Then, the NLP model was deployed in the clinical setting, and response metrics of the NLP-routed messages were compared to a paralleled sample of unrouted patient messages. The metrics were the time to healthcare staff interaction, the time to conversation completion, and the total number of messages by all healthcare staff. The study found that the NLP-routed group needed less time to reach a healthcare staff member initially and to complete an entire conversation. The NLP also consistently and accurately labelled the messages into appropriate categories. This study demonstrated that an NLP classifier in the EHR can decrease message response times and reduce the messaging burden among healthcare staff.
Click here to read the study in NEJM AI
Relevant Reading: Artificial Intelligence–Generated Draft Replies to Patient Inbox Messages
In-Depth [prospective cohort]: The NLP model was developed using a database of 40,000 EHR messages from adult patients. The messages were annotated by a study clinician into one of five categories: urgent, clinician, refill, schedule, or form. After NLP development, this model was deployed to four outpatient sites. The intervention group had messages automatically routed by the NLP, and the control group included a parallel sample of unrouted messages. Messages in both groups were extracted from the same sites within the same two-week period with identical inclusion and exclusion criteria. The primary metrics compared between the two groups were the time from patient message initiation to the first interaction (defined as any reads, forwards, or replies) by health care staff, the time from message initiation to conversation completion, and the number of total message interactions by all healthcare staff. Secondary metrics included the precision, recall, and accuracy of NLP message labelling. The initial response time was shorter in the intervention group (difference in medians, −1 hour; 95% confidence interval (CI), −1.42 to −0.5) and to complete an entire conversation (difference in medians, −22.5 hours; 95% CI, −36.3 to −17.7). Staff in the intervention group also had fewer total message interactions than the observation group (difference in medians, −2.0 interactions; 95% CI, −2.9 to −1.4). Additionally, the NLP achieved precisions, recalls, and accuracies of 95.0% and above across all five categories. Overall, this study demonstrated the feasibility of using an NLP classifier to enhance operational efficiency and reduce administrative burden for healthcare staff.
Image: PD
©2025 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc.