首页 | 本学科首页   官方微博 | 高级检索  
检索        


Application of optical character recognition with natural language processing for large-scale quality metric data extraction in colonoscopy reports
Authors:Sobia Nasir Laique  Umar Hayat  Shashank Sarvepalli  Byron Vaughn  Mounir Ibrahim  John McMichael  Kanza Noor Qaiser  Carol Burke  Amit Bhatt  Colin Rhodes  Maged K Rizk
Institution:1. Division of Gastroenterology and Hepatology, Mayo Clinic, Phoenix, Arizona, USA;2. Division of Gastroenterology, University of Minnesota, Minneapolis, Minnesota, USA;3. Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio, USA;4. Department of Hospital Medicine, Cleveland Clinic, Cleveland, Ohio, USA;5. Department of Bioinformatics, Vanderbilt University, Nashville, Tennessee, USA;6. eHealth Technology, West Henrietta, New York, New York, USA;1. Department of Gastroenterology, New Tokyo Hospital, Chiba, Japan;2. Department of Endoscopy, Shimane Prefectural Central Hospital, Izumo, Japan;3. Department of Gastroenterology, Matsue Red Cross Hospital, Matsue, Japan;4. Department of Gastroenterology, Shimane Prefectural Central Hospital, Izumo, Japan;5. Department of Endoscopy, New Tokyo Hospital, Chiba, Japan;1. Department of Medicine, Division of Gastroenterology and Hepatology, University of Rochester Medical Center, Rochester, New York, USA;2. Department of Medicine, Division of Gastroenterology, Geisinger Health System, Danville, Pennsylvania, USA;3. Department of Medicine, Division of Gastroenterology and Hepatology, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, USA;4. Department of Internal Medicine, Rochester Regional Health, Rochester, New York, USA;5. Department of Medicine, Franciscan St James Hospital, Olympia Fields, Illinois, USA;6. Department of Medicine, Division of Gastroenterology, Silver Cross Medical Center, Oak Brook, Illinois, USA;1. Department of Gastroenterology and Hepatology, Westmead Hospital, Westmead, New South Wales, Australia;2. University of Sydney, Sydney, New South Wales, Australia;3. Blacktown Clinical School, Western Sydney University, Blacktown, New South Wales, Australia;1. Department of Gastroenterology, Omori Red Cross Hospital, Tokyo, Japan;2. Department of Gastroenterology, NTT Medical Center Tokyo, Tokyo, Japan;3. Department of Gastroenterology and Hepatology, Yokohama City University School of Medicine, Yokohama, Japan
Abstract:Background and AimsColonoscopy is commonly performed for colorectal cancer screening in the United States. Reports are often generated in a non-standardized format and are not always integrated into electronic health records. Thus, this information is not readily available for streamlining quality management, participating in endoscopy registries, or reporting of patient- and center-specific risk factors predictive of outcomes. We aim to demonstrate the use of a new hybrid approach using natural language processing of charts that have been elucidated with optical character recognition processing (OCR/NLP hybrid) to obtain relevant clinical information from scanned colonoscopy and pathology reports, a technology co-developed by Cleveland Clinic and eHealth Technologies (West Henrietta, NY, USA).MethodsThis was a retrospective study conducted at Cleveland Clinic, Cleveland, Ohio, and the University of Minnesota, Minneapolis, Minnesota. A randomly sampled list of outpatient screening colonoscopy procedures and pathology reports was selected. Desired variables were then collected. Two researchers first manually reviewed the reports for the desired variables. Then, the OCR/NLP algorithm was used to obtain the same variables from 3 electronic health records in use at our institution: Epic (Verona, Wisc, USA), ProVation (Minneapolis, Minn, USA) used for endoscopy reporting, and Sunquest PowerPath (Tucson, Ariz, USA) used for pathology reporting.ResultsCompared with manual data extraction, the accuracy of the hybrid OCR/NLP approach to detect polyps was 95.8%, adenomas 98.5%, sessile serrated polyps 99.3%, advanced adenomas 98%, inadequate bowel preparation 98.4%, and failed cecal intubation 99%. Comparison of the dataset collected via NLP alone with that collected using the hybrid OCR/NLP approach showed that the accuracy for almost all variables was >99%.ConclusionsOur study is the first to validate the use of a unique hybrid OCR/NLP technology to extract desired variables from scanned procedure and pathology reports contained in image format with an accuracy >95%.
Keywords:ACG"}  {"#name":"keyword"  "$":{"id":"kwrd0015"}  "$$":[{"#name":"text"  "_":"American College of Gastroenterology  ADR"}  {"#name":"keyword"  "$":{"id":"kwrd0025"}  "$$":[{"#name":"text"  "_":"adenoma detection rate  ASGE"}  {"#name":"keyword"  "$":{"id":"kwrd0035"}  "$$":[{"#name":"text"  "_":"American Society for Gastrointestinal Endoscopy  CRC"}  {"#name":"keyword"  "$":{"id":"kwrd0045"}  "$$":[{"#name":"text"  "_":"colorectal cancer  EHR"}  {"#name":"keyword"  "$":{"id":"kwrd0055"}  "$$":[{"#name":"text"  "_":"electronic health record  NLP"}  {"#name":"keyword"  "$":{"id":"kwrd0065"}  "$$":[{"#name":"text"  "_":"natural language processing  OCR"}  {"#name":"keyword"  "$":{"id":"kwrd0075"}  "$$":[{"#name":"text"  "_":"optical character recognition  PPV"}  {"#name":"keyword"  "$":{"id":"kwrd0085"}  "$$":[{"#name":"text"  "_":"positive predictive value  SQL"}  {"#name":"keyword"  "$":{"id":"kwrd0095"}  "$$":[{"#name":"text"  "_":"Structured Query Language  SSP"}  {"#name":"keyword"  "$":{"id":"kwrd0105"}  "$$":[{"#name":"text"  "_":"sessile serrated polyp
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号