The MITRE Corporation, Bedford, Massachusetts, USA
Abstract:
Objective
To describe a system for determining the assertion status of medical problems mentioned in clinical reports, which was entered in the 2010 i2b2/VA community evaluation ‘Challenges in natural language processing for clinical data’ for the task of classifying assertions associated with problem concepts extracted from patient records.
Materials and methods
A combination of machine learning (conditional random field and maximum entropy) and rule-based (pattern matching) techniques was used to detect negation, speculation, and hypothetical and conditional information, as well as information associated with persons other than the patient.
Results
The best submission obtained an overall micro-averaged F-score of 0.9343.
Conclusions
Using semantic attributes of concepts and information about document structure as features for statistical classification of assertions is a good way to leverage rule-based and statistical techniques. In this task, the choice of features may be more important than the choice of classifier algorithm.