期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

共查询到20条相似文献，搜索用时 31 毫秒

Automatic extraction of medication information from medical discharge summaries

Hui Yang 《J Am Med Inform Assoc》2010,17(5):545-548

Objective

This article describes a system developed for the 2009 i2b2 Medication Extraction Challenge. The purpose of this challenge is to extract medication information from hospital discharge summaries.

Design

The system explored several linguistic natural language processing techniques (eg, term-based and token-based rule matching) to identify medication-related information in the narrative text. A number of lexical resources was constructed to profile lexical or morphological features for different categories of medication constituents.

Measurements

Performance was evaluated in terms of the micro-averaged F-measure at the horizontal system level.

Results

The automated system performed well, and achieved an F-micro of 80% for the term-level results and 81% for the token-level results, placing it sixth in exact matches and fourth in inexact matches in the i2b2 competition.

Conclusion

The overall results show that this relatively simple rule-based approach is capable of tackling multiple entity identification tasks such as medication extraction under situations in which few training documents are annotated for machine learning approaches, and the entity information can be characterized with a set of feature tokens. 相似文献

Extracting medical information from narrative patient records: the case of medication-related information

Louise Deléger Cyril Grouin Pierre Zweigenbaum 《J Am Med Inform Assoc》2010,17(5):555-558

Objective

While essential for patient care, information related to medication is often written as free text in clinical records and, therefore, difficult to use in computerized systems. This paper describes an approach to automatically extract medication information from clinical records, which was developed to participate in the i2b2 2009 challenge, as well as different strategies to improve the extraction.

Design

Our approach relies on a semantic lexicon and extraction rules as a two-phase strategy: first, drug names are recognized and, then, the context of these names is explored to extract drug-related information (mode, dosage, etc) according to rules capturing the document structure and the syntax of each kind of information. Different configurations are tested to improve this baseline system along several dimensions, particularly drug name recognition—this step being a determining factor to extract drug-related information. Changes were tested at the level of the lexicons and of the extraction rules.

Results

The initial system participating in i2b2 achieved good results (global F-measure of 77%). Further testing of different configurations substantially improved the system (global F-measure of 81%), performing well for all types of information (eg, 84% for drug names and 88% for modes), except for durations and reasons, which remain problematic.

Conclusion

This study demonstrates that a simple rule-based system can achieve good performance on the medication extraction task. We also showed that controlled modifications (lexicon filtering and rule refinement) were the improvements that best raised the performance. 相似文献

Linguistic approach for identification of medication names and related information in clinical narratives

Thierry Hamon Natalia Grabar 《J Am Med Inform Assoc》2010,17(5):549-554

Background

Pharmacotherapy is an integral part of any medical care process and plays an important role in the medical history of most patients. Information on medication is crucial for several tasks such as pharmacovigilance, medical decision or biomedical research.

Objectives

Within a narrative text, medication-related information can be buried within other non-relevant data. Specific methods, such as those provided by text mining, must be designed for accessing them, and this is the objective of this study.

Methods

The authors designed a system for analyzing narrative clinical documents to extract from them medication occurrences and medication-related information. The system also attempts to deduce medications not covered by the dictionaries used.

Results

Results provided by the system were evaluated within the framework of the I2B2 NLP challenge held in 2009. The system achieved an F-measure of 0.78 and ranked 7th out of 20 participating teams (the highest F-measure was 0.86). The system provided good results for the annotation and extraction of medication names, their frequency, dosage and mode of administration (F-measure over 0.81), while information on duration and reasons is poorly annotated and extracted (F-measure 0.36 and 0.29, respectively). The performance of the system was stable between the training and test sets. 相似文献

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

Qi Li Haijun Zhai Louise Deleger Todd Lingren Megan Kaiser Laura Stoutenborough Imre Solti 《J Am Med Inform Assoc》2013,20(5):915-921

Objective

The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication–attribute linkage detection in two clinical corpora.

Data and methods

We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system''s performance against the human-generated gold standard.

Results

The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions

We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN. 相似文献

Medication information extraction with linguistic pattern matching and semantic rules

Irena Spasi? Farzaneh Sarafraz John A Keane Goran Nenadi? 《J Am Med Inform Assoc》2010,17(5):532-535

Objective

This study presents a system developed for the 2009 i2b2 Challenge in Natural Language Processing for Clinical Data, whose aim was to automatically extract certain information about medications used by a patient from his/her medical report. The aim was to extract the following information for each medication: name, dosage, mode/route, frequency, duration and reason.

Design

The system implements a rule-based methodology, which exploits typical morphological, lexical, syntactic and semantic features of the targeted information. These features were acquired from the training dataset and public resources such as the UMLS and relevant web pages. Information extracted by pattern matching was combined together using context-sensitive heuristic rules.

Measurements

The system was applied to a set of 547 previously unseen discharge summaries, and the extracted information was evaluated against a manually prepared gold standard consisting of 251 documents. The overall ranking of the participating teams was obtained using the micro-averaged F-measure as the primary evaluation metric.

Results

The implemented method achieved the micro-averaged F-measure of 81% (with 86% precision and 77% recall), which ranked this system third in the challenge. The significance tests revealed the system''s performance to be not significantly different from that of the second ranked system. Relative to other systems, this system achieved the best F-measure for the extraction of duration (53%) and reason (46%).

Conclusion

Based on the F-measure, the performance achieved (81%) was in line with the initial agreement between human annotators (82%), indicating that such a system may greatly facilitate the process of extracting relevant information from medical records by providing a solid basis for a manual review process.The 2009 i2b2 medication extraction challenge¹ focused on the extraction of medication-related information including: medication name (m), dosage (do), mode (mo), frequency (f), duration (du) and reason (r) from discharge summaries. In other words, free-text medical records needed to be converted into a structured form by filling a template (a data structure with the predefined slots)² with the relevant information extracted (slot fillers). For example, the following sentence:“In the past two months, she had been taking Ativan of 3–4 mg q.d. for anxiety.”should be converted automatically into a structured form as follows:m=“ativan” ‖ do=“3–4 mg” ‖ mo=“nm” ‖ f=“q.d.” ‖ du=“two months” ‖ r=“for anxiety”Note that only explicitly mentioned information was to be extracted with no attempt to map it to standardized terminology or to interpret it semantically. 相似文献

Integrating existing natural language processing tools for medication extraction from discharge summaries

Son Doan Lisa Bastarache Sergio Klimkowski Joshua C Denny Hua Xu 《J Am Med Inform Assoc》2010,17(5):528-531

Objective

To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration.

Design

We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams.

Measurements

The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge.

Results

Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information.

Conclusions

The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance. 相似文献

Automated concept-level information extraction to reduce the need for custom software and rules development

Leonard W D'Avolio Thien M Nguyen Sergey Goryachev Louis D Fiore 《J Am Med Inform Assoc》2011,18(5):607-613

Objective

Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval.

Materials and methods

A ‘learn by example’ approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge''s concept extraction task provided the data sets and metrics used to evaluate performance.

Results

Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks.

Discussion

With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation.

Conclusion

Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download. 相似文献

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries

Xu Y Hong K Tsujii J Chang EI 《J Am Med Inform Assoc》2012,19(5):824-832

Objective

A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.

Design

The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.

Measurements

Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.

Results

The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.

Conclusions

The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance. 相似文献

Comprehensive temporal information detection from clinical text: medical events,time, and TLINK identification

Sunghwan Sohn Kavishwar B Wagholikar Dingcheng Li Siddhartha R Jonnalagadda Cui Tao Ravikumar Komandur Elayavilli Hongfang Liu 《J Am Med Inform Assoc》2013,20(5):836-842

Background

Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge.

Objective

To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text.

Materials and methods

The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework.

Results

Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537.

Conclusions

Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance. 相似文献

10.

Hybrid methods for improving information access in clinical documents: concept,assertion, and relation identification

Anne-Lyse Minard Anne-Laure Ligozat Asma Ben Abacha Delphine Bernhard Bruno Cartoni Louise Deléger Brigitte Grau Sophie Rosset Pierre Zweigenbaum Cyril Grouin 《J Am Med Inform Assoc》2011,18(5):588-593

Objective

This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.

Design

The authors''approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors'' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.

Results

The authors''assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors'' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10.

Conclusion

On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores. 相似文献

11.

Lancet: a high precision medication event extraction system for clinical text

Zuofeng Li Feifan Liu Lamont Antieau Yonggang Cao Hong Yu 《J Am Med Inform Assoc》2010,17(5):563-567

Objective

This paper presents Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.

Design

Lancet incorporates three supervised machine-learning models: a conditional random fields model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines disambiguation model for identifying the context style (narrative or list).

Measurements

The authors, from the University of Wisconsin-Milwaukee, participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics provided by the i2b2 challenge, the micro F1 (precision/recall) scores are reported for both the horizontal and vertical level.

Results

Among the top 10 teams, Lancet achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared with the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.

Conclusions

Supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score.Lancet based on this learning framework does not rely on expensive manually curated rules. The system is available online at http://code.google.com/p/lancet/.Pharmacotherapy is an important part of a patient''s medical treatment, and nearly all patient records incorporate a significant amount of medication information. The administration of medication at a specific time point during the patient''s medical diagnosis, treatment, or prevention of disease is referred to as a medication event,^1–3 and the written representation of these events typically comprises the name of the medication and any of its associated fields, including but not limited to dosage, mode, frequency, etc.⁴ Accurately capturing medication events from patient records is an important step toward large-scale data mining and knowledge discovery,⁵ medication surveillance and clinical decision support⁶ and medication reconciliation.^7–10In addition to its importance, medication event information (eg, treatment outcomes, medication reactions and allergy information) is often difficult to extract, as clinical records exhibit a range of different styles and grammatical structures for recording such information.⁴ Therefore, Informatics for Integrating Biology & the Bedside (i2b2) recognized automatic medication event extraction with natural language processing (NLP) approaches as one of the great challenges in medical informatics. As one of 20 groups that participated in the i2b2 medication extraction challenge, we report in this study on Lancet, which we developed for medication event extraction. 相似文献

12.

Improving textual medication extraction using combined conditional random fields and rule-based systems

Domonkos Tikk Illés Solt 《J Am Med Inform Assoc》2010,17(5):540-544

Objective

In the i2b2 Medication Extraction Challenge, medication names together with details of their administration were to be extracted from medical discharge summaries.

Design

The task of the challenge was decomposed into three pipelined components: named entity identification, context-aware filtering and relation extraction. For named entity identification, first a rule-based (RB) method that was used in our overall fifth place-ranked solution at the challenge was investigated. Second, a conditional random fields (CRF) approach is presented for named entity identification (NEI) developed after the completion of the challenge. The CRF models are trained on the 17 ground truth documents, the output of the rule-based NEI component on all documents, a larger but potentially inaccurate training dataset. For both NEI approaches their effect on relation extraction performance was investigated. The filtering and relation extraction components are both rule-based.

Measurements

In addition to the official entry level evaluation of the challenge, entity level analysis is also provided.

Results

On the test data an entry level F₁-score of 80% was achieved for exact matching and 81% for inexact matching with the RB-NEI component. The CRF produces a significantly weaker result, but CRF outperforms the rule-based model with 81% exact and 82% inexact F₁-score (p<0.02).

Conclusion

This study shows that a simple rule-based method is on a par with more complicated machine learners; CRF models can benefit from the addition of the potentially inaccurate training data, when only very few training documents are available. Such training data could be generated using the outputs of rule-based methods.Biomedical text mining has been a continuously growing field in the past few decades, because it has proved its efficiency in a wide range of application areas, such as the identification of biological entities (MeSH terms, proteins, genes, etc.)¹ ² and their relationships³ ⁴ in free text, assigning insurance codes to clinical records,⁵ facilitating querying in biomedical databases⁶; for a recent survey, see Cohen and Hersh.⁷Pharmacotherapy information, including patients'' responses to medications, is found in textual clinical records, such as discharge summaries. Physicians may be interested in analyzing statistically relevant data or specific cases based on clinical records. For this, such texts have to be processed extracting relevant pieces of information and arranging them into meaningful structures automatically. These tasks are called information extraction (IE) and relation extraction (RE) in the text mining fields.The goal of the Informatics for Integrating Biology and the Bedside (i2b2) Medication Extraction Challenge⁸ was to extract from discharge summaries information on medications experienced by the patient. This task—termed medication extraction—is a relational information extraction problem consisting of three subtasks. First, text fragments of different semantic types in free text have to be found; this is called named entity identification (NEI) or tagging. Second, filtering is performed to limit the scope of the RE. Third, in RE the entities within the scope of interest are investigated and determined whether they are in relation or not.Here we propose a RE pipeline for medication extraction. We briefly discuss the results and lessons learned from our study. For the community, we provide an appendix to this paper (available online only at http://jamia.bmj.com) and the source code (available at http://www.categorizer.tmit.bme.hu/∼illes/i2b2/medication). 相似文献

13.

A knowledge discovery and reuse pipeline for information extraction in clinical notes

Jon D Patrick Dung H M Nguyen Yefeng Wang Min Li 《J Am Med Inform Assoc》2011,18(5):574-579

Objective

Information extraction and classification of clinical data are current challenges in natural language processing. This paper presents a cascaded method to deal with three different extractions and classifications in clinical data: concept annotation, assertion classification and relation classification.

Materials and Methods

A pipeline system was developed for clinical natural language processing that includes a proofreading process, with gold-standard reflexive validation and correction. The information extraction system is a combination of a machine learning approach and a rule-based approach. The outputs of this system are used for evaluation in all three tiers of the fourth i2b2/VA shared-task and workshop challenge.

Results

Overall concept classification attained an F-score of 83.3% against a baseline of 77.0%, the optimal F-score for assertions about the concepts was 92.4% and relation classifier attained 72.6% for relationships between clinical concepts against a baseline of 71.0%. Micro-average results for the challenge test set were 81.79%, 91.90% and 70.18%, respectively.

Discussion

The challenge in the multi-task test requires a distribution of time and work load for each individual task so that the overall performance evaluation on all three tasks would be more informative rather than treating each task assessment as independent. The simplicity of the model developed in this work should be contrasted with the very large feature space of other participants in the challenge who only achieved slightly better performance. There is a need to charge a penalty against the complexity of a model as defined in message minimalisation theory when comparing results.

Conclusion

A complete pipeline system for constructing language processing models that can be used to process multiple practical detection tasks of language structures of clinical records is presented. 相似文献

14.

Automated identification of drug and food allergies entered using non-standard terminology

Richard H Epstein Paul St Jacques Michael Stockin Brian Rothman Jesse M Ehrenfeld Joshua C Denny 《J Am Med Inform Assoc》2013,20(5):962-968

Objective

An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems.

Materials and methods

An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records).

Results

Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively.

Discussion

The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download.

Conclusions

A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems. 相似文献

15.

Use of health information technology in home health and hospice agencies: United States, 2007

Helaine E Resnick Majd Alwan 《J Am Med Inform Assoc》2010,17(4):389-395

Objective

This report provides updated estimates on use of electronic medical records (EMRs) in US home health and hospice (HHH) agencies, describes utilization of EMR functionalities, and presents novel data on telemedicine and point of care documentation (PoCD) in this setting.

Design

Nationally representative, cross-sectional survey of US HHH agencies conducted in 2007.

Measurements

Data on agency characteristics, current use of EMR systems as well as use of telemedicine and PoCD were collected.

Results

In 2007, 43% of US HHH agencies reported use of an EMR system. Patient demographics (40%) and clinical notes (34%) were the most commonly used EMR functions among US HHH agencies. Only 20% of agencies with EMR systems had health information sharing functionality and about half of them used it. Telemedicine was used by 21% of all HHH agencies, with most (87%) of these offering home health services. Among home health agencies using telemedicine, greater than 90% used telephone monitoring and about two-thirds used non-video monitoring. Nearly 29% of HHH agencies reported using electronic PoCD systems, most often for Outcome and Assessment Information Set (OASIS) data capture (79%). Relative to for-profit HHH agencies, non-profit agencies used considerably more EMR (70% vs 28%, p<0.001) and PoCD (63% vs 9%, p<0.001).

Conclusions

Between 2000 and 2007, there was a 33% increase in use of EMR among HHH agencies in the US. In 2007, use of EMR and PoCD technologies in non-profit agencies was significantly higher than for-profit ones. Finally, HHH agencies generally tended to use available EMR functionalities, including health information sharing. 相似文献

16.

Effects of personal identifier resynthesis on clinical text de-identification

Reyyan Yeniterzi John Aberdeen Samuel Bayer Ben Wellner Lynette Hirschman Bradley Malin 《J Am Med Inform Assoc》2010,17(2):159-168

Objective

De-identified medical records are critical to biomedical research. Text de-identification software exists, including “resynthesis” components that replace real identifiers with synthetic identifiers. The goal of this research is to evaluate the effectiveness and examine possible bias introduced by resynthesis on de-identification software.

Design

We evaluated the open-source MITRE Identification Scrubber Toolkit, which includes a resynthesis capability, with clinical text from Vanderbilt University Medical Center patient records. We investigated four record classes from over 500 patients'' files, including laboratory reports, medication orders, discharge summaries and clinical notes. We trained and tested the de-identification tool on real and resynthesized records.

Measurements

We measured performance in terms of precision, recall, F-measure and accuracy for the detection of protected health identifiers as designated by the HIPAA Safe Harbor Rule.

Results

The de-identification tool was trained and tested on a collection of real and resynthesized Vanderbilt records. Results for training and testing on the real records were 0.990 accuracy and 0.960 F-measure. The results improved when trained and tested on resynthesized records with 0.998 accuracy and 0.980 F-measure but deteriorated moderately when trained on real records and tested on resynthesized records with 0.989 accuracy 0.862 F-measure. Moreover, the results declined significantly when trained on resynthesized records and tested on real records with 0.942 accuracy and 0.728 F-measure.

Conclusion

The de-identification tool achieves high accuracy when training and test sets are homogeneous (ie, both real or resynthesized records). The resynthesis component regularizes the data to make them less “realistic,” resulting in loss of performance particularly when training on resynthesized data and testing on real data. 相似文献

17.

MITRE system for clinical assertion status classification

Cheryl Clark John Aberdeen Matt Coarr David Tresner-Kirsch Ben Wellner Alexander Yeh Lynette Hirschman 《J Am Med Inform Assoc》2011,18(5):563-567

Objective

To describe a system for determining the assertion status of medical problems mentioned in clinical reports, which was entered in the 2010 i2b2/VA community evaluation ‘Challenges in natural language processing for clinical data’ for the task of classifying assertions associated with problem concepts extracted from patient records.

Materials and methods

A combination of machine learning (conditional random field and maximum entropy) and rule-based (pattern matching) techniques was used to detect negation, speculation, and hypothetical and conditional information, as well as information associated with persons other than the patient.

Results

The best submission obtained an overall micro-averaged F-score of 0.9343.

Conclusions

Using semantic attributes of concepts and information about document structure as features for statistical classification of assertions is a good way to leverage rule-based and statistical techniques. In this task, the choice of features may be more important than the choice of classifier algorithm. 相似文献

18.

Automatic detection of omissions in medication lists

Sharique Hasan George T Duncan Daniel B Neill Rema Padman 《J Am Med Inform Assoc》2011,18(4):449-458

Objective

Evidence suggests that the medication lists of patients are often incomplete and could negatively affect patient outcomes. In this article, the authors propose the application of collaborative filtering methods to the medication reconciliation task. Given a current medication list for a patient, the authors employ collaborative filtering approaches to predict drugs the patient could be taking but are missing from their observed list.

Design

The collaborative filtering approach presented in this paper emerges from the insight that an omission in a medication list is analogous to an item a consumer might purchase from a product list. Online retailers use collaborative filtering to recommend relevant products using retrospective purchase data. In this article, the authors argue that patient information in electronic medical records, combined with artificial intelligence methods, can enhance medication reconciliation. The authors formulate the detection of omissions in medication lists as a collaborative filtering problem. Detection of omissions is accomplished using several machine-learning approaches. The effectiveness of these approaches is evaluated using medication data from three long-term care centers. The authors also propose several decision-theoretic extensions to the methodology for incorporating medical knowledge into recommendations.

Results

Results show that collaborative filtering identifies the missing drug in the top-10 list about 40–50% of the time and the therapeutic class of the missing drug 50%–65% of the time at the three clinics in this study.

Conclusion

Results suggest that collaborative filtering can be a valuable tool for reconciling medication lists, complementing currently recommended process-driven approaches. However, a one-size-fits-all approach is not optimal, and consideration should be given to context (eg, types of patients and drug regimens) and consequence (eg, the impact of omission on outcomes). 相似文献

19.

Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

Taxiarchis Botsis Michael D Nguyen Emily Jane Woo Marianthi Markatou Robert Ball 《J Am Med Inform Assoc》2011,18(5):631-638

Objective

The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload.

Design

We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N_pos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations.

Measurements

Classifiers'' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman''s test; misclassification error rate analysis was also performed.

Results

Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively).

Conclusion

Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably. 相似文献

20.

Extracting Rx information from clinical narrative

James G Mork Olivier Bodenreider Dina Demner-Fushman Rezarta Islamaj Do?an Fran?ois-Michel Lang Zhiyong Lu Aurélie Névéol Lee Peters Sonya E Shooshan Alan R Aronson 《J Am Med Inform Assoc》2010,17(5):536-539

Objective

The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative.

Design

Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification.

Measurements

Evaluation metrics and corresponding results were provided by the Challenge organizers.

Results

The results indicate that robust rule-based tools achieve satisfactory results in extraction of simple elements of medication orders, but more sophisticated methods are needed for identification of reasons for the orders and durations.

Limitations

Owing to the time constraints and nature of the Challenge, some obvious follow-on analysis has not been completed yet.

Conclusions

The authors plan to integrate the new modules with MetaMap to enhance its accuracy. This integration effort will provide guidance in retargeting existing tools for better processing of clinical text. 相似文献