首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Objective

To characterize global structural features of large-scale biomedical terminologies using currently emerging statistical approaches.

Design

Given rapid growth of terminologies, this research was designed to address scalability. We selected 16 terminologies covering a variety of domains from the UMLS Metathesaurus, a collection of terminological systems. Each was modeled as a network in which nodes were atomic concepts and links were relationships asserted by the source vocabulary. For comparison against each terminology we created three random networks of equivalent size and density.

Measurements

Average node degree, node degree distribution, clustering coefficient, average path length.

Results

Eight of 16 terminologies exhibited the small-world characteristics of a short average path length and strong local clustering. An overlapping subset of nine exhibited a power law distribution in node degrees, indicative of a scale-free architecture. We attribute these features to specific design constraints. Constraints on node connectivity, common in more synthetic classification systems, localize the effects of changes and deletions. In contrast, small-world and scale-free features, common in comprehensive medical terminologies, promote flexible navigation and less restrictive organic-like growth.

Conclusion

While thought of as synthetic, grid-like structures, some controlled terminologies are structurally indistinguishable from natural language networks. This paradoxical result suggests that terminology structure is shaped not only by formal logic-based semantics, but by rules analogous to those that govern social networks and biological systems. Graph theoretic modeling shows early promise as a framework for describing terminology structure. Deeper understanding of these techniques may inform the development of scalable terminologies and ontologies.  相似文献   

2.
3.

Objective

To study existing problem list terminologies (PLTs), and to identify a subset of concepts based on standard terminologies that occur frequently in problem list data.

Design

Problem list terms and their usage frequencies were collected from large healthcare institutions.

Measurement

The pattern of usage of the terms was analyzed. The local terms were mapped to the Unified Medical Language System (UMLS). Based on the mapped UMLS concepts, the degree of overlap between the PLTs was analyzed.

Results

Six institutions submitted 76 237 terms and their usage frequencies in 14 million patients. The distribution of usage was highly skewed. On average, 21% of unique terms already covered 95% of usage. The most frequently used 14 395 terms, representing the union of terms that covered 95% of usage in each institution, were exhaustively mapped to the UMLS. 13 261 terms were successfully mapped to 6776 UMLS concepts. Less frequently used terms were generally less ‘mappable’ to the UMLS. The mean pairwise overlap of the PLTs was only 21% (median 19%). Concepts that were shared among institutions were used eight times more often than concepts unique to one institution. A SNOMED Problem List Subset of frequently used problem list concepts was identified.

Conclusions

Most of the frequently used problem list terms could be found in standard terminologies. The overlap between existing PLTs was low. The use of the SNOMED Problem List Subset will save developmental effort, reduce variability of PLTs, and enhance interoperability of problem list data.  相似文献   

4.

Objective

This study has two objectives: first, to identify and characterize consumer health terms not found in the Unified Medical Language System (UMLS) Metathesaurus (2007 AB); second, to describe the procedure for creating new concepts in the process of building a consumer health vocabulary. How do the unmapped consumer health concepts relate to the existing UMLS concepts? What is the place of these new concepts in professional medical discourse?

Design

The consumer health terms were extracted from two large corpora derived in the process of Open Access Collaboratory Consumer Health Vocabulary (OAC CHV) building. Terms that could not be mapped to existing UMLS concepts via machine and manual methods prompted creation of new concepts, which were then ascribed semantic types, related to existing UMLS concepts, and coded according to specified criteria.

Results

This approach identified 64 unmapped concepts, 17 of which were labeled as uniquely “lay” and not feasible for inclusion in professional health terminologies. The remaining terms constituted potential candidates for inclusion in professional vocabularies, or could be constructed by post-coordinating existing UMLS terms. The relationship between new and existing concepts differed depending on the corpora from which they were extracted.

Conclusion

Non-mapping concepts constitute a small proportion of consumer health terms, but a proportion that is likely to affect the process of consumer health vocabulary building. We have identified a novel approach for identifying such concepts.  相似文献   

5.

Objective

To evaluate state-of-the-art unsupervised methods on the word sense disambiguation (WSD) task in the clinical domain. In particular, to compare graph-based approaches relying on a clinical knowledge base with bottom-up topic-modeling-based approaches. We investigate several enhancements to the topic-modeling techniques that use domain-specific knowledge sources.

Materials and methods

The graph-based methods use variations of PageRank and distance-based similarity metrics, operating over the Unified Medical Language System (UMLS). Topic-modeling methods use unlabeled data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database to derive models for each ambiguous word. We investigate the impact of using different linguistic features for topic models, including UMLS-based and syntactic features. We use a sense-tagged clinical dataset from the Mayo Clinic for evaluation.

Results

The topic-modeling methods achieve 66.9% accuracy on a subset of the Mayo Clinic''s data, while the graph-based methods only reach the 40–50% range, with a most-frequent-sense baseline of 56.5%. Features derived from the UMLS semantic type and concept hierarchies do not produce a gain over bag-of-words features in the topic models, but identifying phrases from UMLS and using syntax does help.

Discussion

Although topic models outperform graph-based methods, semantic features derived from the UMLS prove too noisy to improve performance beyond bag-of-words.

Conclusions

Topic modeling for WSD provides superior results in the clinical domain; however, integration of knowledge remains to be effectively exploited.  相似文献   

6.

Objective

To address the problem of mapping local laboratory terminologies to Logical Observation Identifiers Names and Codes (LOINC). To study different ontology matching algorithms and investigate how the probability of term combinations in LOINC helps to increase match quality and reduce manual effort.

Materials and methods

We proposed two matching strategies: full name and multi-part. The multi-part approach also considers the occurrence probability of combined concept parts. It can further recommend possible combinations of concept parts to allow more local terms to be mapped. Three real-world laboratory databases from Taiwanese hospitals were used to validate the proposed strategies with respect to different quality measures and execution run time. A comparison with the commonly used tool, Regenstrief LOINC Mapping Assistant (RELMA) Lab Auto Mapper (LAM), was also carried out.

Results

The new multi-part strategy yields the best match quality, with F-measure values between 89% and 96%. It can automatically match 70–85% of the laboratory terminologies to LOINC. The recommendation step can further propose mapping to (proposed) LOINC concepts for 9–20% of the local terminology concepts. On average, 91% of the local terminology concepts can be correctly mapped to existing or newly proposed LOINC concepts.

Conclusions

The mapping quality of the multi-part strategy is significantly better than that of LAM. It enables domain experts to perform LOINC matching with little manual work. The probability of term combinations proved to be a valuable strategy for increasing the quality of match results, providing recommendations for proposed LOINC conepts, and decreasing the run time for match processing.  相似文献   

7.

Objective

To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.

Methods

Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.

Results

The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.

Conclusions

This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.  相似文献   

8.

Objective

To develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications.

Design

We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used α-skew divergence as the similarity measure.

Measurements

The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed.

Results

The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively.

Conclusion

The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.  相似文献   

9.

Objective

Biomedical research increasingly relies on the integration of information from multiple heterogeneous data sources. Despite the fact that structural and terminological aspects of interoperability are interdependent and rely on a common set of requirements, current efforts typically address them in isolation. We propose a unified ontology-based knowledge framework to facilitate interoperability between heterogeneous sources, and investigate if using the LexEVS terminology server is a viable implementation method.

Materials and methods

We developed a framework based on an ontology, the general information model (GIM), to unify structural models and terminologies, together with relevant mapping sets. This allowed a uniform access to these resources within LexEVS to facilitate interoperability by various components and data sources from implementing architectures.

Results

Our unified framework has been tested in the context of the EU Framework Program 7 TRANSFoRm project, where it was used to achieve data integration in a retrospective diabetes cohort study. The GIM was successfully instantiated in TRANSFoRm as the clinical data integration model, and necessary mappings were created to support effective information retrieval for software tools in the project.

Conclusions

We present a novel, unifying approach to address interoperability challenges in heterogeneous data sources, by representing structural and semantic models in one framework. Systems using this architecture can rely solely on the GIM that abstracts over both the structure and coding. Information models, terminologies and mappings are all stored in LexEVS and can be accessed in a uniform manner (implementing the HL7 CTS2 service functional model). The system is flexible and should reduce the effort needed from data sources personnel for implementing and managing the integration.  相似文献   

10.
11.

Background

The RxNorm and NDF-RT (National Drug File Reference Terminology) are a suite of terminology standards for clinical drugs designated for use in the US federal government systems for electronic exchange of clinical health information. Analyzing how different drug products described in these terminologies are categorized into drug classes will help in their better organization and classification of pharmaceutical information.

Methods

Mappings between drug products in RxNorm and NDF-RT drug classes were extracted. Mappings were also extracted between drug products in RxNorm to five high-level NDF-RT categories: Chemical Structure; cellular or subcellular Mechanism of Action; organ-level or system-level Physiologic Effect; Therapeutic Intent; and Pharmacokinetics. Coverage for the mappings and the gaps were evaluated and analyzed algorithmically.

Results

Approximately 54% of RxNorm drug products (Semantic Clinical Drugs) were found not to have a correspondence in NDF-RT. Similarly, approximately 45% of drug products in NDF-RT are missing from RxNorm, most of which can be attributed to differences in dosage, strength, and route form. Approximately 81% of Chemical Structure classes, 42% of Mechanism of Action classes, 75% of Physiologic Effect classes, 76% of Therapeutic Intent classes, and 88% of Pharmacokinetics classes were also found not to have any RxNorm drug products classified under them. Finally, various issues regarding inconsistent mappings between drug concepts were identified in both terminologies.

Conclusion

This investigation identified potential limitations of the existing classification systems and various issues in specification of correspondences between the concepts in RxNorm and NDF-RT. These proposals and methods provide the preliminary steps in addressing some of the requirements.  相似文献   

12.
13.

Background

Visual information is a crucial aspect of medical knowledge. Building a comprehensive medical image base, in the spirit of the Unified Medical Language System (UMLS), would greatly benefit patient education and self-care. However, collection and annotation of such a large-scale image base is challenging.

Objective

To combine visual object detection techniques with medical ontology to automatically mine web photos and retrieve a large number of disease manifestation images with minimal manual labeling effort.

Methods

As a proof of concept, we first learnt five organ detectors on three detection scales for eyes, ears, lips, hands, and feet. Given a disease, we used information from the UMLS to select affected body parts, ran the pretrained organ detectors on web images, and combined the detection outputs to retrieve disease images.

Results

Compared with a supervised image retrieval approach that requires training images for every disease, our ontology-guided approach exploits shared visual information of body parts across diseases. In retrieving 2220 web images of 32 diseases, we reduced manual labeling effort to 15.6% while improving the average precision by 3.9% from 77.7% to 81.6%. For 40.6% of the diseases, we improved the precision by 10%.

Conclusions

The results confirm the concept that the web is a feasible source for automatic disease image retrieval for health image database construction. Our approach requires a small amount of manual effort to collect complex disease images, and to annotate them by standard medical ontology terms.  相似文献   

14.

Objective

Ensuring the security and appropriate use of patient health information contained within electronic medical records systems is challenging. Observing these difficulties, we present an addition to the explanation-based auditing system (EBAS) that attempts to determine the clinical or operational reason why accesses occur to medical records based on patient diagnosis information. Accesses that can be explained with a reason are filtered so that the compliance officer has fewer suspicious accesses to review manually.

Methods

Our hypothesis is that specific hospital employees are responsible for treating a given diagnosis. For example, Dr Carl accessed Alice''s medical record because Hem/Onc employees are responsible for chemotherapy patients. We present metrics to determine which employees are responsible for a diagnosis and quantify their confidence. The auditing system attempts to use this responsibility information to determine the reason why an access occurred. We evaluate the auditing system''s classification quality using data from the University of Michigan Health System.

Results

The EBAS correctly determines which departments are responsible for a given diagnosis. Adding this responsibility information to the EBAS increases the number of first accesses explained by a factor of two over previous work and explains over 94% of all accesses with high precision.

Conclusions

The EBAS serves as a complementary security tool for personal health information. It filters a majority of accesses such that it is more feasible for a compliance officer to review the remaining suspicious accesses manually.  相似文献   

15.
16.

Background

Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification.

Methods

We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus.

Results

Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts.

Conclusions

We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification.

Data sharing

We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex.  相似文献   

17.

Purpose

To examine the impact of a personal health record (PHR) on medication-use safety among older adults.

Background

Online PHRs have potential as tools to manage health information. We know little about how to make PHRs accessible for older adults and what effects this will have.

Methods

A PHR was designed and pretested with older adults and tested in a 6-month randomized controlled trial. After completing mailed baseline questionnaires, eligible computer users aged 65 and over were randomized 3:1 to be given access to a PHR (n=802) or serve as a standard care control group (n=273). Follow-up questionnaires measured change from baseline medication use, medication reconciliation behaviors, and medication management problems.

Results

Older adults were interested in keeping track of their health and medication information. A majority (55.2%) logged into the PHR and used it, but only 16.1% used it frequently. At follow-up, those randomized to the PHR group were significantly less likely to use multiple non-steroidal anti-inflammatory drugs—the most common warning generated by the system (viewed by 23% of participants). Compared with low/non-users, high users reported significantly more changes in medication use and improved medication reconciliation behaviors, and recognized significantly more side effects, but there was no difference in use of inappropriate medications or adherence measures.

Conclusions

PHRs can engage older adults for better medication self-management; however, features that motivate continued use will be needed. Longer-term studies of continued users will be required to evaluate the impact of these changes in behavior on patient health outcomes.  相似文献   

18.

Objective

The Department of Veterans Affairs (VA) operates one of the largest nationwide healthcare systems and is increasing use of internet technology, including development of an online personal health record system called My HealtheVet. This study examined internet use among veterans in general and particularly use of online health information among VA patients and specifically mental health service users.

Methods

A nationally representative sample of 7215 veterans from the 2010 National Survey of Veterans was used. Logistic regression was employed to examine background characteristics associated with internet use and My HealtheVet.

Results

71% of veterans reported using the internet and about a fifth reported using My HealtheVet. Veterans who were younger, more educated, white, married, and had higher incomes were more likely to use the internet. There was no association between background characteristics and use of My HealtheVet. Mental health service users were no less likely to use the internet or My HealtheVet than other veterans.

Discussion

Most veterans are willing to access VA information online, although many VA service users do not use My HealtheVet, suggesting more education and research is needed to reduce barriers to its use.

Conclusion

Although adoption of My HealtheVet has been slow, the majority of veterans, including mental health service users, use the internet and indicate a willingness to receive and interact with health information online.  相似文献   

19.

Objective

This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia.

Design

A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification.

Results

Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively.

Conclusion

Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance.  相似文献   

20.

Background

Cannabis abuse has been associated with psychiatric disorders

Methods

The pattern of cannabis use and incidence of cannabis dependence and cannabis psychosis among 471 consecutive patients admitted to a tertiary care psychiatric center was investigated.

Results

Cannabis use was reported by 67 (14.23%) patients of whom 42 (8.92%) were occasional users, 18 (3.82%) were classified as frequent users while 7 (1.49%) fulfilled criteria for cannabis dependence. 3 (0.64%) patients showed symptoms which were characteristic of cannabis psychosis. Among the 67 cannabis users, 56 (83.58%) had their first exposure to cannabis before entering service at 13-19 years of age. The remaining 14 (16.09%) began consuming cannabis 1-5 years after joining service.

Conclusion

The reasons given for using cannabis were curiosity about its effects 32 (47.76%), peer pressure 17 (25.37%) or traditional use during festivals 18 (26.87%).Key Words: Cannabis dependence, Psychiatric disorders  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号