共查询到20条相似文献,搜索用时 15 毫秒
1.
Objective
Concept extraction is a process to identify phrases referring to concepts of interests in unstructured text. It is a critical component in automated text processing. We investigate the performance of machine learning taggers for clinical concept extraction, particularly the portability of taggers across documents from multiple data sources.Methods
We used BioTagger-GM to train machine learning taggers, which we originally developed for the detection of gene/protein names in the biology domain. Trained taggers were evaluated using the annotated clinical documents made available in the 2010 i2b2/VA Challenge workshop, consisting of documents from four data sources.Results
As expected, performance of a tagger trained on one data source degraded when evaluated on another source, but the degradation of the performance varied depending on data sources. A tagger trained on multiple data sources was robust, and it achieved an F score as high as 0.890 on one data source. The results also suggest that performance of machine learning taggers is likely to improve if more annotated documents are available for training.Conclusion
Our study shows how the performance of machine learning taggers is degraded when they are ported across clinical documents from different sources. The portability of taggers can be enhanced by training on datasets from multiple sources. The study also shows that BioTagger-GM can be easily extended to detect clinical concept mentions with good performance. 相似文献2.
Leonard W D'Avolio Thien M Nguyen Wildon R Farwell Yongming Chen Felicia Fitzmeyer Owen M Harris Louis D Fiore 《J Am Med Inform Assoc》2010,17(4):375-382
Reducing custom software development effort is an important goal in information retrieval (IR). This study evaluated a generalizable approach involving with no custom software or rules development. The study used documents “consistent with cancer” to evaluate system performance in the domains of colorectal (CRC), prostate (PC), and lung (LC) cancer. Using an end-user-supplied reference set, the automated retrieval console (ARC) iteratively calculated performance of combinations of natural language processing-derived features and supervised classification algorithms. Training and testing involved 10-fold cross-validation for three sets of 500 documents each. Performance metrics included recall, precision, and F-measure. Annotation time for five physicians was also measured. Top performing algorithms had recall, precision, and F-measure values as follows: for CRC, 0.90, 0.92, and 0.89, respectively; for PC, 0.97, 0.95, and 0.94; and for LC, 0.76, 0.80, and 0.75. In all but one case, conditional random fields outperformed maximum entropy-based classifiers. Algorithms had good performance without custom code or rules development, but performance varied by specific application. 相似文献
3.
Objective
To create a highly accurate coreference system in discharge summaries for the 2011 i2b2 challenge. The coreference categories include Person, Problem, Treatment, and Test.Design
An integrated coreference resolution system was developed by exploiting Person attributes, contextual semantic clues, and world knowledge. It includes three subsystems: Person coreference system based on three Person attributes, Problem/Treatment/Test system based on numerous contextual semantic extractors and world knowledge, and Pronoun system based on a multi-class support vector machine classifier. The three Person attributes are patient, relative and hospital personnel. Contextual semantic extractors include anatomy, position, medication, indicator, temporal, spatial, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from external resources such as Wikipedia.Measurements
Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.Results
The system achieved an overall micro-averaged precision, recall and F-measure of 0.906, 0.925, and 0.915, respectively, on test data (from four hospitals) released by the challenge organizers. It achieved a precision, recall and F-measure of 0.905, 0.920 and 0.913, respectively, on test data without Pittsburgh data. We ranked the first out of 20 competing teams. Among the four sub-tasks on Person, Problem, Treatment, and Test, the highest F-measure was seen for Person coreference.Conclusions
This system achieved encouraging results. The Person system can determine whether personal pronouns and proper names are coreferent or not. The Problem/Treatment/Test system benefits from both world knowledge in evaluating the similarity of two mentions and contextual semantic extractors in identifying semantic clues. The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries. 相似文献4.
Objective
A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records.Materials and methods
A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier.Results
The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7.Discussion
Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction.Conclusion
Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available. 相似文献5.
目的:利用MeSH组配规则自动抽取文摘中表达特定语义关系的句子,为制定自然语言处理关系抽取模板以及句子水平的信息检索提供依据。方法:根据主题词组配规则,使用python语言从文摘数据中匹配出含有特定MeSH主题词概念的候选关系句,从中抽取出以描述概念间关系的短语或句子。邀请专家对100条候选关系句进行概念间语义关系人工标注,将得到的语义关系三元组作为评价金标准,与自动抽取出的概念间关系进行对比分析。将自动抽取的结果加以整理形成特定概念之间的语义关系表达。结果:对大量的自然文本句进行句法分析,批量识别出2个特定概念间语义关系抽取方法的准确率为87%,召回率为62%,F1=71.8%。结论:利用MeSH组配规则抽取表达特定语义关系句子的方法具有较高的准确率与召回率,对生物医学文本理解及医学知识发现等具有借鉴意义。 相似文献
6.
Objective
A method for the automatic resolution of coreference between medical concepts in clinical records.Materials and methods
A multiple pass sieve approach utilizing support vector machines (SVMs) at each pass was used to resolve coreference. Information such as lexical similarity, recency of a concept mention, synonymy based on Wikipedia redirects, and local lexical context were used to inform the method. Results were evaluated using an unweighted average of MUC, CEAF, and B3 coreference evaluation metrics. The datasets used in these research experiments were made available through the 2011 i2b2/VA Shared Task on Coreference.Results
The method achieved an average F score of 0.821 on the ODIE dataset, with a precision of 0.802 and a recall of 0.845. These results compare favorably to the best-performing system with a reported F score of 0.827 on the dataset and the median system F score of 0.800 among the eight teams that participated in the 2011 i2b2/VA Shared Task on Coreference. On the i2b2 dataset, the method achieved an average F score of 0.906, with a precision of 0.895 and a recall of 0.918 compared to the best F score of 0.915 and the median of 0.859 among the 16 participating teams.Discussion
Post hoc analysis revealed significant performance degradation on pathology reports. The pathology reports were characterized by complex synonymy and very few patient mentions.Conclusion
The use of several simple lexical matching methods had the most impact on achieving competitive performance on the task of coreference resolution. Moreover, the ability to detect patients in electronic medical records helped to improve coreference resolution more than other linguistic analysis. 相似文献7.
8.
简要介绍医学情报专业教育的发展,分析新时代医学情报研究机构、卫生信息中心、数字图书馆的特点和发展要求,探讨医学情报专业教育的教学内容及方法,提出增加卫生信息化、卫生统计学教学内容、转变思想观念———由信息服务向知识服务转化等几点建议。 相似文献
9.
Objective
This paper describes natural-language-processing techniques for two tasks: identification of medical concepts in clinical text, and classification of assertions, which indicate the existence, absence, or uncertainty of a medical problem. Because so many resources are available for processing clinical texts, there is interest in developing a framework in which features derived from these resources can be optimally selected for the two tasks of interest.Materials and methods
The authors used two machine-learning (ML) classifiers: support vector machines (SVMs) and conditional random fields (CRFs). Because SVMs and CRFs can operate on a large set of features extracted from both clinical texts and external resources, the authors address the following research question: Which features need to be selected for obtaining optimal results? To this end, the authors devise feature-selection techniques which greatly reduce the amount of manual experimentation and improve performance.Results
The authors evaluated their approaches on the 2010 i2b2/VA challenge data. Concept extraction achieves 79.59 micro F-measure. Assertion classification achieves 93.94 micro F-measure.Discussion
Approaching medical concept extraction and assertion classification through ML-based techniques has the advantage of easily adapting to new data sets and new medical informatics tasks. However, ML-based techniques perform best when optimal features are selected. By devising promising feature-selection techniques, the authors obtain results that outperform the current state of the art.Conclusion
This paper presents two ML-based approaches for processing language in the clinical texts evaluated in the 2010 i2b2/VA challenge. By using novel feature-selection methods, the techniques presented in this paper are unique among the i2b2 participants. 相似文献10.
Katrin Kirchhoff Anne M Turner Amittai Axelrod Francisco Saavedra 《J Am Med Inform Assoc》2011,18(4):473-478
Objective
Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials.Design
The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations.Results
Machine translation plus postediting took 15–53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors.Conclusion
The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations. 相似文献11.
Michael F Gensheimer Sonya Aggarwal Kathryn R.K Benson Justin N Carter A. Solomon Henry Douglas J Wood Scott G Soltys Steven Hancock Erqi Pollom Nigam H Shah Daniel T Chang 《J Am Med Inform Assoc》2021,28(6):1108
ObjectiveBeing able to predict a patient’s life expectancy can help doctors and patients prioritize treatments and supportive care. For predicting life expectancy, physicians have been shown to outperform traditional models that use only a few predictor variables. It is possible that a machine learning model that uses many predictor variables and diverse data sources from the electronic medical record can improve on physicians’ performance. For patients with metastatic cancer, we compared accuracy of life expectancy predictions by the treating physician, a machine learning model, and a traditional model.Materials and MethodsA machine learning model was trained using 14 600 metastatic cancer patients’ data to predict each patient’s distribution of survival time. Data sources included note text, laboratory values, and vital signs. From 2015–2016, 899 patients receiving radiotherapy for metastatic cancer were enrolled in a study in which their radiation oncologist estimated life expectancy. Survival predictions were also made by the machine learning model and a traditional model using only performance status. Performance was assessed with area under the curve for 1-year survival and calibration plots.ResultsThe radiotherapy study included 1190 treatment courses in 899 patients. A total of 879 treatment courses in 685 patients were included in this analysis. Median overall survival was 11.7 months. Physicians, machine learning model, and traditional model had area under the curve for 1-year survival of 0.72 (95% CI 0.63–0.81), 0.77 (0.73–0.81), and 0.68 (0.65–0.71), respectively.ConclusionsThe machine learning model’s predictions were more accurate than those of the treating physician or a traditional model. 相似文献
12.
Cheryl Clark John Aberdeen Matt Coarr David Tresner-Kirsch Ben Wellner Alexander Yeh Lynette Hirschman 《J Am Med Inform Assoc》2011,18(5):563-567
Objective
To describe a system for determining the assertion status of medical problems mentioned in clinical reports, which was entered in the 2010 i2b2/VA community evaluation ‘Challenges in natural language processing for clinical data’ for the task of classifying assertions associated with problem concepts extracted from patient records.Materials and methods
A combination of machine learning (conditional random field and maximum entropy) and rule-based (pattern matching) techniques was used to detect negation, speculation, and hypothetical and conditional information, as well as information associated with persons other than the patient.Results
The best submission obtained an overall micro-averaged F-score of 0.9343.Conclusions
Using semantic attributes of concepts and information about document structure as features for statistical classification of assertions is a good way to leverage rule-based and statistical techniques. In this task, the choice of features may be more important than the choice of classifier algorithm. 相似文献13.
Objective
A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.Design
The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.Measurements
Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.Results
The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.Conclusions
The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance. 相似文献14.
研究生网络医学信息检索教学的实践及思考 总被引:2,自引:0,他引:2
结合几年来在研究生文献检索课中开展网络医学信息检索教学的实践,对教学目标、内容、课时安排和教学模式进行了探讨.认为网络检索课的主要目标是培养学生的网络信息意识和获取网络信息技巧.在课时允许情况下,应增加四方面教学内容:网络信息检索的技巧、网络原始资料的获取、网络专利信息的检索和如何精选网络信息;强调远程教育模式是未来网络检索教学的发展方向. 相似文献
15.
Jonnalagadda SR Li D Sohn S Wu ST Wagholikar K Torii M Liu H 《J Am Med Inform Assoc》2012,19(5):867-874
Objective
This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity.Materials and methods
The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve.Results
The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B3, MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set.Discussion
A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts.Conclusion
Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref. 相似文献16.
Richard H Epstein Paul St Jacques Michael Stockin Brian Rothman Jesse M Ehrenfeld Joshua C Denny 《J Am Med Inform Assoc》2013,20(5):962-968
Objective
An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems.Materials and methods
An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records).Results
Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively.Discussion
The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download.Conclusions
A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems. 相似文献17.
李红梅 《中华医学图书情报杂志》2011,20(11):55-58
基于合作学习的基本理论,探讨了研究生医学信息检索课合作化教学的原则、策略和方法。教学实践证明,该方法能更好地发挥教师的主导作用,增强学生学习的主动性,有效培养学生的信息意识、信息能力和创新思维能力,是符合学习者心理特征的一种行之有效的教学方法。 相似文献
18.
ObjectiveSubstance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data.Materials and MethodsPediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC).ResultsThe dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively).ConclusionsIt is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies. 相似文献
19.
以首都医科大学为例,分析医学信息检索课程现状及发展需求,详细阐述形成性评价体系构建与实施,总结存在的问题并提出对策,指出该体系有助于激发学生学习兴趣,提高自主学习能力,提升信息素养。 相似文献
20.
Arjun Magge Elena Tutubalina Zulfat Miftahutdinov Ilseyar Alimova Anne Dirkson Suzan Verberne Davy Weissenbacher Graciela Gonzalez-Hernandez 《J Am Med Inform Assoc》2021,28(10):2184
ObjectiveResearch on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs.Materials and MethodsWe present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average ‘natural balance’ with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks.ResultsThe system presented achieved state-of-the-art performance on comparable datasets and scored a classification performance of F1 = 0.63, span extraction performance of F1 = 0.44 and an end-to-end entity resolution performance of F1 = 0.34 on the presented dataset.DiscussionThe performance of the models continues to highlight multiple challenges when deploying pharmacovigilance systems that use social media data. We discuss the implications of such models in the downstream tasks of signal detection and suggest future enhancements.ConclusionMining ADEs from Twitter posts using a pipeline architecture requires the different components to be trained and tuned based on input data imbalance in order to ensure optimal performance on the end-to-end resolution task. 相似文献