首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Objective

This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity.

Materials and methods

The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve.

Results

The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B3, MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set.

Discussion

A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts.

Conclusion

Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref.  相似文献   

2.
Xu Y  Liu J  Wu J  Wang Y  Tu Z  Sun JT  Tsujii J  Chang EI 《J Am Med Inform Assoc》2012,19(5):897-905

Objective

To create a highly accurate coreference system in discharge summaries for the 2011 i2b2 challenge. The coreference categories include Person, Problem, Treatment, and Test.

Design

An integrated coreference resolution system was developed by exploiting Person attributes, contextual semantic clues, and world knowledge. It includes three subsystems: Person coreference system based on three Person attributes, Problem/Treatment/Test system based on numerous contextual semantic extractors and world knowledge, and Pronoun system based on a multi-class support vector machine classifier. The three Person attributes are patient, relative and hospital personnel. Contextual semantic extractors include anatomy, position, medication, indicator, temporal, spatial, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from external resources such as Wikipedia.

Measurements

Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.

Results

The system achieved an overall micro-averaged precision, recall and F-measure of 0.906, 0.925, and 0.915, respectively, on test data (from four hospitals) released by the challenge organizers. It achieved a precision, recall and F-measure of 0.905, 0.920 and 0.913, respectively, on test data without Pittsburgh data. We ranked the first out of 20 competing teams. Among the four sub-tasks on Person, Problem, Treatment, and Test, the highest F-measure was seen for Person coreference.

Conclusions

This system achieved encouraging results. The Person system can determine whether personal pronouns and proper names are coreferent or not. The Problem/Treatment/Test system benefits from both world knowledge in evaluating the similarity of two mentions and contextual semantic extractors in identifying semantic clues. The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries.  相似文献   

3.

Objective

The long-term goal of this work is the automated discovery of anaphoric relations from the clinical narrative. The creation of a gold standard set from a cross-institutional corpus of clinical notes and high-level characteristics of that gold standard are described.

Methods

A standard methodology for annotation guideline development, gold standard annotations, and inter-annotator agreement (IAA) was used.

Results

The gold standard annotations resulted in 7214 markables, 5992 pairs, and 1304 chains. Each report averaged 40 anaphoric markables, 33 pairs, and seven chains. The overall IAA is high on the Mayo dataset (0.6607), and moderate on the University of Pittsburgh Medical Center (UPMC) dataset (0.4072). The IAA between each annotator and the gold standard is high (Mayo: 0.7669, 0.7697, and 0.9021; UPMC: 0.6753 and 0.7138). These results imply a quality corpus feasible for system development. They also suggest the complementary nature of the annotations performed by the experts and the importance of an annotator team with diverse knowledge backgrounds.

Limitations

Only one of the annotators had the linguistic background necessary for annotation of the linguistic attributes. The overall generalizability of the guidelines will be further strengthened by annotations of data from additional sites. This will increase the overall corpus size and the representation of each relation type.

Conclusion

The first step toward the development of an anaphoric relation resolver as part of a comprehensive natural language processing system geared specifically for the clinical narrative in the electronic medical record is described. The deidentified annotated corpus will be available to researchers.  相似文献   

4.

Objective

Narratives of electronic medical records contain information that can be useful for clinical practice and multi-purpose research. This information needs to be put into a structured form before it can be used by automated systems. Coreference resolution is a step in the transformation of narratives into a structured form.

Methods

This study presents a medical coreference resolution system (MCORES) for noun phrases in four frequently used clinical semantic categories: persons, problems, treatments, and tests. MCORES treats coreference resolution as a binary classification task. Given a pair of concepts from a semantic category, it determines coreferent pairs and clusters them into chains. MCORES uses an enhanced set of lexical, syntactic, and semantic features. Some MCORES features measure the distance between various representations of the concepts in a pair and can be asymmetric.

Results and Conclusion

MCORES was compared with an in-house baseline that uses only single-perspective ‘token overlap’ and ‘number agreement’ features. MCORES was shown to outperform the baseline; its enhanced features contribute significantly to performance. In addition to the baseline, MCORES was compared against two available third-party, open-domain systems, RECONCILEACL09 and the Beautiful Anaphora Resolution Toolkit (BART). MCORES was shown to outperform both of these systems on clinical records.  相似文献   

5.

Objective

A method for the automatic resolution of coreference between medical concepts in clinical records.

Materials and methods

A multiple pass sieve approach utilizing support vector machines (SVMs) at each pass was used to resolve coreference. Information such as lexical similarity, recency of a concept mention, synonymy based on Wikipedia redirects, and local lexical context were used to inform the method. Results were evaluated using an unweighted average of MUC, CEAF, and B3 coreference evaluation metrics. The datasets used in these research experiments were made available through the 2011 i2b2/VA Shared Task on Coreference.

Results

The method achieved an average F score of 0.821 on the ODIE dataset, with a precision of 0.802 and a recall of 0.845. These results compare favorably to the best-performing system with a reported F score of 0.827 on the dataset and the median system F score of 0.800 among the eight teams that participated in the 2011 i2b2/VA Shared Task on Coreference. On the i2b2 dataset, the method achieved an average F score of 0.906, with a precision of 0.895 and a recall of 0.918 compared to the best F score of 0.915 and the median of 0.859 among the 16 participating teams.

Discussion

Post hoc analysis revealed significant performance degradation on pathology reports. The pathology reports were characterized by complex synonymy and very few patient mentions.

Conclusion

The use of several simple lexical matching methods had the most impact on achieving competitive performance on the task of coreference resolution. Moreover, the ability to detect patients in electronic medical records helped to improve coreference resolution more than other linguistic analysis.  相似文献   

6.

Background

The fifth i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records conducted a systematic review on resolution of noun phrase coreference in medical records. Informatics for Integrating Biology and the Bedside (i2b2) and the Veterans Affair (VA) Consortium for Healthcare Informatics Research (CHIR) partnered to organize the coreference challenge. They provided the research community with two corpora of medical records for the development and evaluation of the coreference resolution systems. These corpora contained various record types (ie, discharge summaries, pathology reports) from multiple institutions.

Methods

The coreference challenge provided the community with two annotated ground truth corpora and evaluated systems on coreference resolution in two ways: first, it evaluated systems for their ability to identify mentions of concepts and to link together those mentions. Second, it evaluated the ability of the systems to link together ground truth mentions that refer to the same entity. Twenty teams representing 29 organizations and nine countries participated in the coreference challenge.

Results

The teams'' system submissions showed that machine-learning and rule-based approaches worked best when augmented with external knowledge sources and coreference clues extracted from document structure. The systems performed better in coreference resolution when provided with ground truth mentions. Overall, the systems struggled in solving coreference resolution for cases that required domain knowledge.  相似文献   

7.

Objective

Patient discharge summaries provide detailed medical information about hospitalized patients and are a rich resource of data for clinical record text mining. The textual expressions of this information are highly variable. In order to acquire a precise understanding of the patient, it is important to uncover the relationship between all instances in the text. In natural language processing (NLP), this task falls under the category of coreference resolution.

Design

A key contribution of this paper is the application of contextual-dependent rules that describe relationships between coreference pairs. To resolve phrases that refer to the same entity, the authors use these rules in three representative NLP systems: one rule-based, another based on the maximum entropy model, and the last a system built on the Markov logic network (MLN) model.

Results

The experimental results show that the proposed MLN-based system outperforms the baseline system (exact match) by average F-scores of 4.3% and 5.7% on the Beth and Partners datasets, respectively. Finally, the three systems were integrated into an ensemble system, further improving performance to 87.21%, which is 4.5% more than the official i2b2 Track 1C average (82.7%).

Conclusion

In this paper, the main challenges in the resolution of coreference relations in patient discharge summaries are described. Several rules are proposed to exploit contextual information, and three approaches presented. While single systems provided promising results, an ensemble approach combining the three systems produced a better performance than even the best single system.  相似文献   

8.

Objective

Public health surveillance requires outbreak detection algorithms with computational efficiency sufficient to handle the increasing volume of disease surveillance data. In response to this need, the authors propose a spatial clustering algorithm, rank-based spatial clustering (RSC), that detects rapidly infectious but non-contagious disease outbreaks.

Design

The authors compared the outbreak-detection performance of RSC with that of three well established algorithms—the wavelet anomaly detector (WAD), the spatial scan statistic (KSS), and the Bayesian spatial scan statistic (BSS)—using real disease surveillance data on to which they superimposed simulated disease outbreaks.

Measurements

The following outbreak-detection performance metrics were measured: receiver operating characteristic curve, activity monitoring operating curve curve, cluster positive predictive value, cluster sensitivity, and algorithm run time.

Results

RSC was computationally efficient. It outperformed the other two spatial algorithms in terms of detection timeliness, and outbreak localization. RSC also had overall better timeliness than the time-series algorithm WAD at low false alarm rates.

Conclusion

RSC is an ideal algorithm for analyzing large datasets when the application of other spatial algorithms is not practical. It also allows timely investigation for public health practitioners by providing early detection and well-localized outbreak clusters.  相似文献   

9.

Objective

Many healthcare organizations (HCOs) including Kaiser Permanente, Johns Hopkins, Cleveland Medical Center, and MD Anderson Cancer Center, provide access to online health communities as part of their overall patient support services. The key objective in establishing and running these online health communities is to offer empathic support to patients. Patients'' perceived empathy is considered to be critical in patient recovery, specifically, by enhancing patient''s compliance with treatment protocols and the pace of healing. Most online health communities are characterized by two main functions: informational support and social support. This study examines the relative impact of these two distinct functions—that is, as an information seeking forum and as a social support forum—on patients'' perceived empathy in online health communities.

Design

This study tests the impact of two variables that reflect the above functions of online health communities—information seeking effectiveness and perceived social support—on perceived empathy. The model also incorporates the potential moderating effect of homophily on these relationships.

Measurements

A web-based survey was used to collect data from members of the online health communities provided by three major healthcare centers. A regression technique was used to analyze the data to test the hypotheses.

Results

The study finds that it is the information seeking effectiveness rather than the social support which affects patient''s perceived empathy in online health communities run by HCOs. The results indicate that HCOs that provide online health communities for their patients need to focus more on developing tools that will make information seeking more effective and efficient.  相似文献   

10.

Objective

The Substitutable Medical Applications, Reusable Technologies (SMART) Platforms project seeks to develop a health information technology platform with substitutable applications (apps) constructed around core services. The authors believe this is a promising approach to driving down healthcare costs, supporting standards evolution, accommodating differences in care workflow, fostering competition in the market, and accelerating innovation.

Materials and methods

The Office of the National Coordinator for Health Information Technology, through the Strategic Health IT Advanced Research Projects (SHARP) Program, funds the project. The SMART team has focused on enabling the property of substitutability through an app programming interface leveraging web standards, presenting predictable data payloads, and abstracting away many details of enterprise health information technology systems. Containers—health information technology systems, such as electronic health records (EHR), personally controlled health records, and health information exchanges that use the SMART app programming interface or a portion of it—marshal data sources and present data simply, reliably, and consistently to apps.

Results

The SMART team has completed the first phase of the project (a) defining an app programming interface, (b) developing containers, and (c) producing a set of charter apps that showcase the system capabilities. A focal point of this phase was the SMART Apps Challenge, publicized by the White House, using http://www.challenge.gov website, and generating 15 app submissions with diverse functionality.

Conclusion

Key strategic decisions must be made about the most effective market for further disseminating SMART: existing market-leading EHR vendors, new entrants into the EHR market, or other stakeholders such as health information exchanges.  相似文献   

11.

Objective

The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative.

Design

Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification.

Measurements

Evaluation metrics and corresponding results were provided by the Challenge organizers.

Results

The results indicate that robust rule-based tools achieve satisfactory results in extraction of simple elements of medication orders, but more sophisticated methods are needed for identification of reasons for the orders and durations.

Limitations

Owing to the time constraints and nature of the Challenge, some obvious follow-on analysis has not been completed yet.

Conclusions

The authors plan to integrate the new modules with MetaMap to enhance its accuracy. This integration effort will provide guidance in retargeting existing tools for better processing of clinical text.  相似文献   

12.

Objective

To formulate a model for translating manual infection control surveillance methods to automated, algorithmic approaches.

Design

We propose a model for creating electronic surveillance algorithms by translating existing manual surveillance practices into automated electronic methods. Our model suggests that three dimensions of expert knowledge be consulted: clinical, surveillance, and informatics. Once collected, knowledge should be applied through a process of conceptualization, synthesis, programming, and testing.

Results

We applied our framework to central vascular catheter associated bloodstream infection surveillance, a major healthcare performance outcome measure. We found that despite major barriers such as differences in availability of structured data, in types of databases used and in semantic representation of clinical terms, bloodstream infection detection algorithms could be deployed at four very diverse medical centers.

Conclusions

We present a framework that translates existing practice—manual infection detection—to an automated process for surveillance. Our experience details barriers and solutions discovered during development of electronic surveillance for central vascular catheter associated bloodstream infections at four hospitals in a variety of data environments. Moving electronic surveillance to the next level—availability at a majority of acute care hospitals nationwide—would be hastened by the incorporation of necessary data elements, vocabularies and standards into commercially available electronic health records.  相似文献   

13.

Objective

To assess behavioral health providers'' beliefs about the benefits and barriers of health information exchange (HIE).

Methods

Survey of a total of 2010 behavioral health providers in a Midwestern state (33% response rate), with questions based on previously reported open-ended beliefs elicitation interviews.

Results

Factor analysis resulted in four groupings: beliefs that HIE would improve care and communication, add cost and time burdens, present access and vulnerability concerns, and impact workflow and control (positively and negatively). A regression model including all four factors parsimoniously predicted attitudes toward HIE. Providers clustered into two groups based on their beliefs: a majority (67%) were positive about the impact of HIE, and the remainder (33%) were negative. There were some professional/demographic differences between the two clusters of providers.

Discussion

Most behavioral health providers are supportive of HIE; however, their adoption and use of it may continue to lag behind that of medical providers due to perceived cost and time burdens and concerns about access to and vulnerability of information.  相似文献   

14.

Background

The literature describes teenagers as active users of social media, who seem to care about privacy, but who also reveal a considerable amount of personal information. There have been no studies of how they manage personal health information on social media.

Objective

To understand how chronically ill teenage patients manage their privacy on social media sites.

Design

A qualitative study based on a content analysis of semistructured interviews with 20 hospital patients (12–18 years).

Results

Most teenage patients do not disclose their personal health information on social media, even though the study found a pervasive use of Facebook. Facebook is a place to be a “regular”, rather than a sick teenager. It is a place where teenage patients stay up-to-date about their social life—it is not seen as a place to discuss their diagnosis and treatment. The majority of teenage patients don''t use social media to come into contact with others with similar conditions and they don''t use the internet to find health information about their diagnosis.

Conclusions

Social media play an important role in the social life of teenage patients. They enable young patients to be “regular” teenagers. Teenage patients'' online privacy behavior is an expression of their need for self-definition and self-protection.  相似文献   

15.

Background

Providing patients with access to their medical data is widely expected to help educate and empower them to manage their own health. Health information exchange (HIE) infrastructures could potentially help patients access records across multiple healthcare providers. We studied three HIE organizations as they developed portals to give consumers access to HIE data previously exchanged only among healthcare organizations.

Objective

To follow the development of new consumer portal technologies, and to identify barriers and facilitators to patient access to HIE data.

Methods

Semistructured interviews of 15 key informants over a 2-year period spanning the development and early implementation of three new projects, coded according to a sociotechnical framework.

Results

As the organizations tried to develop functionality that fully served the needs of both providers and patients, plans were altered by technical barriers (primarily related to data standardization) and cultural and legal issues surrounding data access. Organizational changes also played an important role in altering project plans. In all three cases, patient access to data was significantly scaled back from initial plans.

Conclusions

This prospective study revealed how sociotechnical factors previously identified as important in health information technology success and failure helped to shape the evolution of three novel consumer informatics projects. Barriers to providing patients with seamless access to their HIE data were multifactorial. Remedies will have to address technical, organizational, cultural, and other factors.  相似文献   

16.

Objective

To describe a new medication information extraction system—Textractor—developed for the ‘i2b2 medication extraction challenge’. The development, functionalities, and official evaluation of the system are detailed.

Design

Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algorithms, while other modules use regular expressions, rules, and dictionaries, and one module embeds MetaMap Transfer.

Measurements

The official evaluation was based on a reference standard of 251 discharge summaries annotated by all teams participating in the challenge. The metrics used were recall, precision, and the F1-measure. They were calculated with exact and inexact matches, and were averaged at the level of systems and documents.

Results

The reference metric for this challenge, the system-level overall F1-measure, reached about 77% for exact matches, with a recall of 72% and a precision of 83%. Performance was the best with route information (F1-measure about 86%), and was good for dosage and frequency information, with F1-measures of about 82–85%. Results were not as good for durations, with F1-measures of 36–39%, and for reasons, with F1-measures of 24–27%.

Conclusion

The official evaluation of Textractor for the i2b2 medication extraction challenge demonstrated satisfactory performance. This system was among the 10 best performing systems in this challenge.  相似文献   

17.
18.

Objective

Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials.

Design

The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations.

Results

Machine translation plus postediting took 15–53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors.

Conclusion

The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations.  相似文献   

19.

Background

The electronic medical record (EMR)/electronic health record (EHR) is becoming an integral component of many primary-care outpatient practices. Before implementing an EMR/EHR system, primary-care practices should have an understanding of the potential benefits and limitations.

Objective

The objective of this study was to systematically review the recent literature around the impact of the EMR/EHR within primary-care outpatient practices.

Materials and methods

Searches of Medline, EMBASE, CINAHL, ABI Inform, and Cochrane Library were conducted to identify articles published between January 1998 and January 2010. The gray literature and reference lists of included articles were also searched. 30 studies met inclusion criteria.

Results and discussion

The EMR/EHR appears to have structural and process benefits, but the impact on clinical outcomes is less clear. Using Donabedian''s framework, five articles focused on the impact on healthcare structure, 21 explored healthcare process issues, and four focused on health-related outcomes.  相似文献   

20.

Background

Pharmacotherapy is an integral part of any medical care process and plays an important role in the medical history of most patients. Information on medication is crucial for several tasks such as pharmacovigilance, medical decision or biomedical research.

Objectives

Within a narrative text, medication-related information can be buried within other non-relevant data. Specific methods, such as those provided by text mining, must be designed for accessing them, and this is the objective of this study.

Methods

The authors designed a system for analyzing narrative clinical documents to extract from them medication occurrences and medication-related information. The system also attempts to deduce medications not covered by the dictionaries used.

Results

Results provided by the system were evaluated within the framework of the I2B2 NLP challenge held in 2009. The system achieved an F-measure of 0.78 and ranked 7th out of 20 participating teams (the highest F-measure was 0.86). The system provided good results for the annotation and extraction of medication names, their frequency, dosage and mode of administration (F-measure over 0.81), while information on duration and reasons is poorly annotated and extracted (F-measure 0.36 and 0.29, respectively). The performance of the system was stable between the training and test sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号