Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

首页 | 本学科首页

官方微博 | 高级检索

按检索

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries

Authors:	Xu Yan Hong Kai Tsujii Junichi Chang Eric I-Chao

Institution:	State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics and Mechanobiology of the Ministry of Education, Beihang University, Beijing, China.

Abstract:	Objective A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification. Design The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features. Measurements Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results. Results The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification. Conclusions The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance.

Keywords:	xy804280 thu text processing natural language processing medical records
本文献已被 PubMed 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司京ICP备09084417号