Automatic classification of mammography reports by BI-RADS breast tissue composition class |
| |
Authors: | Percha Bethany Nassif Houssam Lipson Jafi Burnside Elizabeth Rubin Daniel |
| |
Affiliation: | Biomedical Informatics Program, Stanford University, Stanford, California 94305-5488, USA. |
| |
Abstract: | Because breast tissue composition partially predicts breast cancer risk, classification of mammography reports by breast tissue composition is important from both a scientific and clinical perspective. A method is presented for using the unstructured text of mammography reports to classify them into BI-RADS breast tissue composition categories. An algorithm that uses regular expressions to automatically determine BI-RADS breast tissue composition classes for unstructured mammography reports was developed. The algorithm assigns each report to a single BI-RADS composition class: 'fatty', 'fibroglandular', 'heterogeneously dense', 'dense', or 'unspecified'. We evaluated its performance on mammography reports from two different institutions. The method achieves >99% classification accuracy on a test set of reports from the Marshfield Clinic (Wisconsin) and Stanford University. Since large-scale studies of breast cancer rely heavily on breast tissue composition information, this method could facilitate this research by helping mine large datasets to correlate breast composition with other covariates. |
| |
Keywords: | Mammography natural language processing data mining radiology breast text mining machine learning pharmacogenomics breast imaging quality outcomes research |
本文献已被 PubMed 等数据库收录! |
|