Searching the PDF Haystack: Automated Knowledge Discovery in Scanned EHR Documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Searching the PDF Haystack: Automated Knowledge Discovery in Scanned EHR Documents

Authors:	Alexander L Kostrinsky-Thomas Fuki M Hisama Thomas H Payne

Institution:	1.College of Osteopathic Medicine, Pacific Northwest University of Health Sciences, 200 University Pkwy Yakima, Washington, United States;2.Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, Washington, United States;3.Department of Medicine, University of Washington School of Medicine, Seattle, Washington, United States

Abstract:	Background Clinicians express concern that they may be unaware of important information contained in voluminous scanned and other outside documents contained in electronic health records (EHRs). An example is “unrecognized EHR risk factor information,” defined as risk factors for heritable cancer that exist within a patient''s EHR but are not known by current treating providers. In a related study using manual EHR chart review, we found that half of the women whose EHR contained risk factor information meet criteria for further genetic risk evaluation for heritable forms of breast and ovarian cancer. They were not referred for genetic counseling. Objectives The purpose of this study was to compare the use of automated methods (optical character recognition with natural language processing) versus human review in their ability to identify risk factors for heritable breast and ovarian cancer within EHR scanned documents. Methods We evaluated the accuracy of the chart review by comparing our criterion standard (physician chart review) versus an automated method involving Amazon''s Textract service (Amazon.com, Seattle, Washington, United States), a clinical language annotation modeling and processing toolkit (CLAMP) (Center for Computational Biomedicine at The University of Texas Health Science, Houston, Texas, United States), and a custom-written Java application. Results We found that automated methods identified most cancer risk factor information that would otherwise require clinician manual review and therefore is at risk of being missed. Conclusion The use of automated methods for identification of heritable risk factors within EHRs may provide an accurate yet rapid review of patients'' past medical histories. These methods could be further strengthened via improved analysis of handwritten notes, tables, and colloquial phrases.

Keywords:	electronic health records portable document format optical character recognition natural language processing machine learning evaluation

设为首页 | 免责声明 | 关于勤云 | 加入收藏