首页 | 本学科首页   官方微博 | 高级检索  
     


Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules
Authors:Jonnalagadda Siddhartha Reddy  Li Dingcheng  Sohn Sunghwan  Wu Stephen Tze-Inn  Wagholikar Kavishwar  Torii Manabu  Liu Hongfang
Affiliation:Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA.
Abstract:

Objective

This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity.

Materials and methods

The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve.

Results

The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B3, MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set.

Discussion

A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts.

Conclusion

Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref.
Keywords:Natural language processing   machine learning   information extraction   electronic medical record   coreference resolution   text mining   computational linguistics   named entity recognition   distributional semantics   relationship extraction   information storage and retrieval (text and images)
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号