Finding falls in ambulatory care clinical documents using statistical text mining |
| |
Authors: | James A McCart Donald J Berndt Jay Jarman Dezon K Finch Stephen L Luther |
| |
Institution: | 1.Consortium for Healthcare Informatics Research (CHIR) and the HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation Outcomes, James A Haley Veterans’ Hospital, Tampa, Florida, USA;2.Consortium for Healthcare Informatics Research (CHIR) and the University of South Florida College of Business, Tampa, Florida, USA;3.Consortium for Healthcare Informatics Research (CHIR) and East Tennessee State University College of Business and Technology, Johnson City, Tennessee, USA |
| |
Abstract: | ObjectiveTo determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter.Materials and Methods2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest–D).ResultsAll three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest–D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944.DiscussionThe STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns.ConclusionsThe results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics. |
| |
Keywords: | Text Mining Accidental Falls Electronic Health Records Ambulatory Care |
|
|