Natural language processing may automate data extraction from radiologic reports

1. Natural language processing (NLP) applications may rapidly extract meaningful information from unstructured, free-text radiology reports through a variety of techniques, including diagnostic surveillance, cohort building, quality assessment, and clinical support services.

2. NLP remains an underutilized technique for large-volume, automatic data extraction in both research and clinical practice environments, but has been demonstrated to be a capable tool to reduce or obviate manual chart review in retrospective studies, among other uses.

Evidence Rating: 3 (Average)

Study Rundown: Natural language processing is a technique which can algorithmically identify and convert unstructured text, commonly used in radiologic reports, into a structured form to extract valuable and actionable information for research, quality improvement, or clinical surveillance. NLP can interpret reports into a structured form by drawing upon a radiologic lexicon (e.g. RadLex) or other standardized database to output a particular type of information, such as whether a particular report contains a specific finding, condition, or outcome. Such automated processing of large, prosaic reports allows for the “mining” for information, permitting the extraction of a particular dataset to for the rapid assessment of large volumes of data as is performed in research, particularly retrospective review studies. This systematic review examined 67 relevant publications discussing different natural language processing techniques applied to radiologic information. Four major subgroups of natural language processing applications were identified, including diagnostic surveillance, cohort building for epidemiologic studies, query based case retrieval, and clinical support services. Of particular interest, NLP was suggested as a method to monitor reporting in real time, delivering rapid, summarized data to the referring physician or other radiologists. Across all uses in radiology, NLP demonstrated a high specificity and sensitivity in an analysis of its performance for the detection of findings or recommendations, but only 20 of the 67 reviewed studies’ techniques had been put into daily operational use, which suggests a possible limitation of the generalizability of NLP techniques. As radiologic reports become more structured, future studies should be implemented to attempt to validate center-specific results to larger health systems, and attempt to generalize results to larger populations and radiologic data sets.

Click to read the study in Journal of Radiology

Relevant reading: Mining electronic health records: towards better research applications and clinical care

In-Depth [systematic review]: A systematic review was performed of all the available natural language processing publications available within the MEDLINE and EMBASE databases up to 2014. The search identified 266 articles that were assessed for eligibility, and obtained 67 studies that were included for review. A total of 4 studies were also included based on identification through separate sources into the study. Additionally, these studies were subdivided into different application categories, designated as level 1 (system development and validation, without discussion of operational use), level 2 (operational use discussed and anticipated but not yet realized), level 3 (system in operational use), categorizing how the natural language processing applications were being incorporated into daily practice. In total, 34 studies were level 1, 13 level 2, and 20 level 3 applications of natural language processing. A total of 17 studies were focused on diagnostic surveillance, 18 upon cohort building for epidemiologic studies, 15 for quality assessment of radiologic practice, 7 for query-based case retrieval, and 10 for clinical support services. Overall, sensitivities ranged from 91-99% with variable positive predictive values from 53-86% depending on the specific outcome task. The greatest improvement in performance came from systems that utilized semantic analysis (use of a specific lexicon to identify relevant terms) as opposed to syntactic analysis, which showed the least improvement in performance. Rule-based algorithms and machine learning systems were both similarly effective, and as such, machine learning is becoming more prevalent due to its improved scalability. Despite the excellent performance of these systems, only 30% of those reviewed are in daily operational use, and of these, only a single system has been implemented and validated in a clinical workflow.

Image: PD

©2016 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc.