共查询到7条相似文献,搜索用时 0 毫秒
1.
Braja G Patra Mohit M Sharma Veer Vekaria Prakash Adekkanattu Olga V Patterson Benjamin Glicksberg Lauren A Lepow Euijung Ryu Joanna M Biernacka Alona Furmanchuk Thomas J George William Hogan Yonghui Wu Xi Yang Jiang Bian Myrna Weissman Priya Wickramaratne J John Mann Mark Olfson Thomas R Campion Jr Mark Weiner Jyotishman Pathak 《J Am Med Inform Assoc》2021,28(12):2716
ObjectiveSocial determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs.Materials and MethodsA broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review.ResultsSmoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9).ConclusionNLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems. 相似文献
2.
ObjectiveThe study sought to develop and evaluate a knowledge-based data augmentation method to improve the performance of deep learning models for biomedical natural language processing by overcoming training data scarcity.Materials and MethodsWe extended the easy data augmentation (EDA) method for biomedical named entity recognition (NER) by incorporating the Unified Medical Language System (UMLS) knowledge and called this method UMLS-EDA. We designed experiments to systematically evaluate the effect of UMLS-EDA on popular deep learning architectures for both NER and classification. We also compared UMLS-EDA to BERT.ResultsUMLS-EDA enables substantial improvement for NER tasks from the original long short-term memory conditional random fields (LSTM-CRF) model (micro-F1 score: +5%, + 17%, and +15%), helps the LSTM-CRF model (micro-F1 score: 0.66) outperform LSTM-CRF with transfer learning by BERT (0.63), and improves the performance of the state-of-the-art sentence classification model. The largest gain on micro-F1 score is 9%, from 0.75 to 0.84, better than classifiers with BERT pretraining (0.82).ConclusionsThis study presents a UMLS-based data augmentation method, UMLS-EDA. It is effective at improving deep learning models for both NER and sentence classification, and contributes original insights for designing new, superior deep learning approaches for low-resource biomedical domains. 相似文献
3.
Denis Newman-Griffis Guy Divita Bart Desmet Ayah Zirikly Carolyn P Ros Eric Fosler-Lussier 《J Am Med Inform Assoc》2021,28(3):516
ObjectivesNormalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity—words or phrases that may refer to different concepts—has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research.Materials and MethodsWe identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language.ResultsWe found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena.DiscussionExisting datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods.ConclusionsOur findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization. 相似文献
4.
Tomasz Oliwa Brian Furner Jessica Schmitt John Schneider Jessica P Ridgway 《J Am Med Inform Assoc》2021,28(1):104
ObjectiveAdherence to a treatment plan from HIV-positive patients is necessary to decrease their mortality and improve their quality of life, however some patients display poor appointment adherence and become lost to follow-up (LTFU). We applied natural language processing (NLP) to analyze indications towards or against LTFU in HIV-positive patients’ notes.Materials and MethodsUnstructured lemmatized notes were labeled with an LTFU or Retained status using a 183-day threshold. An NLP and supervised machine learning system with a linear model and elastic net regularization was trained to predict this status. Prevalence of characteristics domains in the learned model weights were evaluated.ResultsWe analyzed 838 LTFU vs 2964 Retained notes and obtained a weighted F1 mean of 0.912 via nested cross-validation; another experiment with notes from the same patients in both classes showed substantially lower metrics. “Comorbidities” were associated with LTFU through, for instance, “HCV” (hepatitis C virus) and likewise “Good adherence” with Retained, represented with “Well on ART” (antiretroviral therapy).DiscussionMentions of mental health disorders and substance use were associated with disparate retention outcomes, however history vs active use was not investigated. There remains further need to model transitions between LTFU and being retained in care over time.ConclusionWe provided an important step for the future development of a model that could eventually help to identify patients who are at risk for falling out of care and to analyze which characteristics could be factors for this. Further research is needed to enhance this method with structured electronic medical record fields. 相似文献
5.
ObjectiveSubstance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data.Materials and MethodsPediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC).ResultsThe dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively).ConclusionsIt is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies. 相似文献
6.
7.
Hua Xu Min Jiang Matt Oetjens Erica A Bowton Andrea H Ramirez Janina M Jeff Melissa A Basford Jill M Pulley James D Cowan Xiaoming Wang Marylyn D Ritchie Daniel R Masys Dan M Roden Dana C Crawford Joshua C Denny 《J Am Med Inform Assoc》2011,18(4):387-391