Comprehensive temporal information detection from clinical text: medical events,time, and TLINK identification |
| |
Authors: | Sunghwan Sohn Kavishwar B Wagholikar Dingcheng Li Siddhartha R Jonnalagadda Cui Tao Ravikumar Komandur Elayavilli Hongfang Liu |
| |
Institution: | Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA |
| |
Abstract: | BackgroundTemporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge.ObjectiveTo construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text.Materials and methodsThe i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework.ResultsOur TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537.ConclusionsOur TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance. |
| |
Keywords: | |
|
|