A supervised framework for resolving coreference in clinical records |
| |
Authors: | Rink Bryan Roberts Kirk Harabagiu Sanda M |
| |
Affiliation: | Human Language Technology Research Institute, University of Texas at Dallas, Richardson, Texas, USA. |
| |
Abstract: | ObjectiveA method for the automatic resolution of coreference between medical concepts in clinical records.Materials and methodsA multiple pass sieve approach utilizing support vector machines (SVMs) at each pass was used to resolve coreference. Information such as lexical similarity, recency of a concept mention, synonymy based on Wikipedia redirects, and local lexical context were used to inform the method. Results were evaluated using an unweighted average of MUC, CEAF, and B3 coreference evaluation metrics. The datasets used in these research experiments were made available through the 2011 i2b2/VA Shared Task on Coreference.ResultsThe method achieved an average F score of 0.821 on the ODIE dataset, with a precision of 0.802 and a recall of 0.845. These results compare favorably to the best-performing system with a reported F score of 0.827 on the dataset and the median system F score of 0.800 among the eight teams that participated in the 2011 i2b2/VA Shared Task on Coreference. On the i2b2 dataset, the method achieved an average F score of 0.906, with a precision of 0.895 and a recall of 0.918 compared to the best F score of 0.915 and the median of 0.859 among the 16 participating teams.DiscussionPost hoc analysis revealed significant performance degradation on pathology reports. The pathology reports were characterized by complex synonymy and very few patient mentions.ConclusionThe use of several simple lexical matching methods had the most impact on achieving competitive performance on the task of coreference resolution. Moreover, the ability to detect patients in electronic medical records helped to improve coreference resolution more than other linguistic analysis. |
| |
Keywords: | Natural language processing clinical informatics medical records systems computerized semantic relations statistical learning machine learning predictive modeling privacy technology |
本文献已被 PubMed 等数据库收录! |
|