首页 | 本学科首页   官方微博 | 高级检索  
检索        


Research and applications: Automatic lymphoma classification with sentence subgraph mining from pathology reports
Authors:Yuan Luo  Aliyah R Sohani  Ephraim P Hochberg  Peter Szolovits
Institution:1.Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;2.Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Cambridge, Massachusetts, USA;3.Center for Lymphoma, Massachusetts General Hospital, Cambridge, Massachusetts, USA;4.Department of Medicine, Harvard Medical School, Cambridge, Massachusetts, USA
Abstract:

Objective

Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision.

Methods

We design a novel framework that translates sentences into graph representations, automatically mines sentence subgraphs, reduces redundancy in mined subgraphs, and automatically generates subgraph features for subsequent classification tasks. To ensure meaningful interpretations over the sentence graphs, we use the Unified Medical Language System Metathesaurus to map token subsequences to concepts, and in turn sentence graph nodes. We test our system with multiple lymphoma classification tasks that together mimic the differential diagnosis by a pathologist. To this end, we prevent our classifiers from looking at explicit mentions or synonyms of lymphomas in the text.

Results and Conclusions

We compare our system with three baseline classifiers using standard n-grams, full MetaMap concepts, and filtered MetaMap concepts. Our system achieves high F-measures on multiple binary classifications of lymphoma (Burkitt lymphoma, 0.8; diffuse large B-cell lymphoma, 0.909; follicular lymphoma, 0.84; Hodgkin lymphoma, 0.912). Significance tests show that our system outperforms all three baselines. Moreover, feature analysis identifies subgraph features that contribute to improved performance; these features agree with the state-of-the-art knowledge about lymphoma classification. We also highlight how these unsupervised relation features may provide meaningful insights into lymphoma classification.
Keywords:Automatic lymphoma classification  Sentence subgraph mining  Pathology reports  Natural language processing
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号