Making Sense of Abbreviations in Nursing Notes for Mortality Prediction

Saturday, 27 July 2019

Jasmine Nakayama, BSN
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, USA

Background: Unstructured data from electronic health records (EHR), such as nursing notes, hold potential for improving predictive models for health outcomes. Nursing notes have valuable information about patient care and frequently include abbreviations. Abbreviation disambiguation, or accounting for abbreviations' meanings, may add more insight when using unstructured text for predictive modeling.

Purpose: To evaluate a new method for abbreviation disambiguation in a pipeline to extract structured information from unstructured nursing notes to predict in-hospital and 30-day mortality.

Methods: Using de-identified EHR data, we developed a pipeline that performed disambiguation of abbreviations, applied standard preprocessing techniques common in natural language processing, performed sentiment analysis via Pattern for Python as well as VADER, and utilized dimensionality reduction via Latent Dirichlet Allocation (LDA) as well as doc2vec to construct useful features from nursing notes. Calculated Elixhauser Comorbidity Indices and nursing notes were used to predict outcomes of in-hospital mortality and 30-day mortality for 3,036 and 1,123 patients respectively. Results of logistic regression models for both outcomes using LDA and doc2vec were compared. Additionally, we developed a nursing abbreviation resource and compared it to an existing resource, the clinical abbreviation recognition and disambiguation (CARD) framework.

Results: Compared to non-abbreviation normalized nursing notes, abbreviation-normalized nursing notes had an improvement of AUC in prediction of in-hospital and 30-day mortality. Our models to predict in-hospital mortality using LDA and doc2vec had AUCs of 0.8129 and 0.7574 respectively. Our models to predict 30-day mortality using LDA and doc2vec had AUCs of 0.7963 and 0.7809 respectively. We found that our nursing abbreviation resource had less noise and ambiguity but detected fewer abbreviations when compared to the CARD framework.

Conclusions: Our results indicate that abbreviation disambiguation in nursing notes for subsequent topic modeling and sentiment analysis improved prediction of in-hospital and 30-day mortality when controlling for comorbidity.

Implications: Analyzing unstructured clinical data from EHRs may provide deeper insight into patient health outcomes. This process of incorporating unstructured text may assist earlier identification of at-risk patients who may benefit from early intervention.