共查询到20条相似文献,搜索用时 31 毫秒
1.
Objective
To characterize global structural features of large-scale biomedical terminologies using currently emerging statistical approaches.Design
Given rapid growth of terminologies, this research was designed to address scalability. We selected 16 terminologies covering a variety of domains from the UMLS Metathesaurus, a collection of terminological systems. Each was modeled as a network in which nodes were atomic concepts and links were relationships asserted by the source vocabulary. For comparison against each terminology we created three random networks of equivalent size and density.Measurements
Average node degree, node degree distribution, clustering coefficient, average path length.Results
Eight of 16 terminologies exhibited the small-world characteristics of a short average path length and strong local clustering. An overlapping subset of nine exhibited a power law distribution in node degrees, indicative of a scale-free architecture. We attribute these features to specific design constraints. Constraints on node connectivity, common in more synthetic classification systems, localize the effects of changes and deletions. In contrast, small-world and scale-free features, common in comprehensive medical terminologies, promote flexible navigation and less restrictive organic-like growth.Conclusion
While thought of as synthetic, grid-like structures, some controlled terminologies are structurally indistinguishable from natural language networks. This paradoxical result suggests that terminology structure is shaped not only by formal logic-based semantics, but by rules analogous to those that govern social networks and biological systems. Graph theoretic modeling shows early promise as a framework for describing terminology structure. Deeper understanding of these techniques may inform the development of scalable terminologies and ontologies. 相似文献2.
3.
Objective
To study existing problem list terminologies (PLTs), and to identify a subset of concepts based on standard terminologies that occur frequently in problem list data.Design
Problem list terms and their usage frequencies were collected from large healthcare institutions.Measurement
The pattern of usage of the terms was analyzed. The local terms were mapped to the Unified Medical Language System (UMLS). Based on the mapped UMLS concepts, the degree of overlap between the PLTs was analyzed.Results
Six institutions submitted 76 237 terms and their usage frequencies in 14 million patients. The distribution of usage was highly skewed. On average, 21% of unique terms already covered 95% of usage. The most frequently used 14 395 terms, representing the union of terms that covered 95% of usage in each institution, were exhaustively mapped to the UMLS. 13 261 terms were successfully mapped to 6776 UMLS concepts. Less frequently used terms were generally less ‘mappable’ to the UMLS. The mean pairwise overlap of the PLTs was only 21% (median 19%). Concepts that were shared among institutions were used eight times more often than concepts unique to one institution. A SNOMED Problem List Subset of frequently used problem list concepts was identified.Conclusions
Most of the frequently used problem list terms could be found in standard terminologies. The overlap between existing PLTs was low. The use of the SNOMED Problem List Subset will save developmental effort, reduce variability of PLTs, and enhance interoperability of problem list data. 相似文献4.
Alla Keselman Catherine Arnott Smith Guy Divita Hyeoneui Kim Allen C. Browne Gondy Leroy Qing Zeng-Treitler 《J Am Med Inform Assoc》2008,15(4):496-505
Objective
This study has two objectives: first, to identify and characterize consumer health terms not found in the Unified Medical Language System (UMLS) Metathesaurus (2007 AB); second, to describe the procedure for creating new concepts in the process of building a consumer health vocabulary. How do the unmapped consumer health concepts relate to the existing UMLS concepts? What is the place of these new concepts in professional medical discourse?Design
The consumer health terms were extracted from two large corpora derived in the process of Open Access Collaboratory Consumer Health Vocabulary (OAC CHV) building. Terms that could not be mapped to existing UMLS concepts via machine and manual methods prompted creation of new concepts, which were then ascribed semantic types, related to existing UMLS concepts, and coded according to specified criteria.Results
This approach identified 64 unmapped concepts, 17 of which were labeled as uniquely “lay” and not feasible for inclusion in professional health terminologies. The remaining terms constituted potential candidates for inclusion in professional vocabularies, or could be constructed by post-coordinating existing UMLS terms. The relationship between new and existing concepts differed depending on the corpora from which they were extracted.Conclusion
Non-mapping concepts constitute a small proportion of consumer health terms, but a proportion that is likely to affect the process of consumer health vocabulary building. We have identified a novel approach for identifying such concepts. 相似文献5.
Objective
To evaluate state-of-the-art unsupervised methods on the word sense disambiguation (WSD) task in the clinical domain. In particular, to compare graph-based approaches relying on a clinical knowledge base with bottom-up topic-modeling-based approaches. We investigate several enhancements to the topic-modeling techniques that use domain-specific knowledge sources.Materials and methods
The graph-based methods use variations of PageRank and distance-based similarity metrics, operating over the Unified Medical Language System (UMLS). Topic-modeling methods use unlabeled data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database to derive models for each ambiguous word. We investigate the impact of using different linguistic features for topic models, including UMLS-based and syntactic features. We use a sense-tagged clinical dataset from the Mayo Clinic for evaluation.Results
The topic-modeling methods achieve 66.9% accuracy on a subset of the Mayo Clinic''s data, while the graph-based methods only reach the 40–50% range, with a most-frequent-sense baseline of 56.5%. Features derived from the UMLS semantic type and concept hierarchies do not produce a gain over bag-of-words features in the topic models, but identifying phrases from UMLS and using syntax does help.Discussion
Although topic models outperform graph-based methods, semantic features derived from the UMLS prove too noisy to improve performance beyond bag-of-words.Conclusions
Topic modeling for WSD provides superior results in the clinical domain; however, integration of knowledge remains to be effectively exploited. 相似文献6.
Li-Hui Lee Anika Gro? Michael Hartung Der-Ming Liou Erhard Rahm 《J Am Med Inform Assoc》2014,21(5):792-800
Objective
To address the problem of mapping local laboratory terminologies to Logical Observation Identifiers Names and Codes (LOINC). To study different ontology matching algorithms and investigate how the probability of term combinations in LOINC helps to increase match quality and reduce manual effort.Materials and methods
We proposed two matching strategies: full name and multi-part. The multi-part approach also considers the occurrence probability of combined concept parts. It can further recommend possible combinations of concept parts to allow more local terms to be mapped. Three real-world laboratory databases from Taiwanese hospitals were used to validate the proposed strategies with respect to different quality measures and execution run time. A comparison with the commonly used tool, Regenstrief LOINC Mapping Assistant (RELMA) Lab Auto Mapper (LAM), was also carried out.Results
The new multi-part strategy yields the best match quality, with F-measure values between 89% and 96%. It can automatically match 70–85% of the laboratory terminologies to LOINC. The recommendation step can further propose mapping to (proposed) LOINC concepts for 9–20% of the local terminology concepts. On average, 91% of the local terminology concepts can be correctly mapped to existing or newly proposed LOINC concepts.Conclusions
The mapping quality of the multi-part strategy is significantly better than that of LAM. It enables domain experts to perform LOINC matching with little manual work. The probability of term combinations proved to be a valuable strategy for increasing the quality of match results, providing recommendations for proposed LOINC conepts, and decreasing the run time for match processing. 相似文献7.
Daniel Albright Arrick Lanfranchi Anwen Fredriksen William F Styler IV Colin Warner Jena D Hwang Jinho D Choi Dmitriy Dligach Rodney D Nielsen James Martin Wayne Ward Martha Palmer Guergana K Savova 《J Am Med Inform Assoc》2013,20(5):922-930
Objective
To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.Methods
Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.Results
The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.Conclusions
This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible. 相似文献8.
Objective
To develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications.Design
We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used α-skew divergence as the similarity measure.Measurements
The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed.Results
The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively.Conclusion
The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing. 相似文献9.
Jean-Fran?ois Ethier Olivier Dameron Vasa Curcin Mark M McGilchrist Robert A Verheij Theodoros N Arvanitis Adel Taweel Brendan C Delaney Anita Burgun 《J Am Med Inform Assoc》2013,20(5):986-994
Objective
Biomedical research increasingly relies on the integration of information from multiple heterogeneous data sources. Despite the fact that structural and terminological aspects of interoperability are interdependent and rely on a common set of requirements, current efforts typically address them in isolation. We propose a unified ontology-based knowledge framework to facilitate interoperability between heterogeneous sources, and investigate if using the LexEVS terminology server is a viable implementation method.Materials and methods
We developed a framework based on an ontology, the general information model (GIM), to unify structural models and terminologies, together with relevant mapping sets. This allowed a uniform access to these resources within LexEVS to facilitate interoperability by various components and data sources from implementing architectures.Results
Our unified framework has been tested in the context of the EU Framework Program 7 TRANSFoRm project, where it was used to achieve data integration in a retrospective diabetes cohort study. The GIM was successfully instantiated in TRANSFoRm as the clinical data integration model, and necessary mappings were created to support effective information retrieval for software tools in the project.Conclusions
We present a novel, unifying approach to address interoperability challenges in heterogeneous data sources, by representing structural and semantic models in one framework. Systems using this architecture can rely solely on the GIM that abstracts over both the structure and coding. Information models, terminologies and mappings are all stored in LexEVS and can be accessed in a uniform manner (implementing the HL7 CTS2 service functional model). The system is flexible and should reduce the effort needed from data sources personnel for implementing and managing the integration. 相似文献10.
11.
Background
The RxNorm and NDF-RT (National Drug File Reference Terminology) are a suite of terminology standards for clinical drugs designated for use in the US federal government systems for electronic exchange of clinical health information. Analyzing how different drug products described in these terminologies are categorized into drug classes will help in their better organization and classification of pharmaceutical information.Methods
Mappings between drug products in RxNorm and NDF-RT drug classes were extracted. Mappings were also extracted between drug products in RxNorm to five high-level NDF-RT categories: Chemical Structure; cellular or subcellular Mechanism of Action; organ-level or system-level Physiologic Effect; Therapeutic Intent; and Pharmacokinetics. Coverage for the mappings and the gaps were evaluated and analyzed algorithmically.Results
Approximately 54% of RxNorm drug products (Semantic Clinical Drugs) were found not to have a correspondence in NDF-RT. Similarly, approximately 45% of drug products in NDF-RT are missing from RxNorm, most of which can be attributed to differences in dosage, strength, and route form. Approximately 81% of Chemical Structure classes, 42% of Mechanism of Action classes, 75% of Physiologic Effect classes, 76% of Therapeutic Intent classes, and 88% of Pharmacokinetics classes were also found not to have any RxNorm drug products classified under them. Finally, various issues regarding inconsistent mappings between drug concepts were identified in both terminologies.Conclusion
This investigation identified potential limitations of the existing classification systems and various issues in specification of correspondences between the concepts in RxNorm and NDF-RT. These proposals and methods provide the preliminary steps in addressing some of the requirements. 相似文献12.
13.
Background
Visual information is a crucial aspect of medical knowledge. Building a comprehensive medical image base, in the spirit of the Unified Medical Language System (UMLS), would greatly benefit patient education and self-care. However, collection and annotation of such a large-scale image base is challenging.Objective
To combine visual object detection techniques with medical ontology to automatically mine web photos and retrieve a large number of disease manifestation images with minimal manual labeling effort.Methods
As a proof of concept, we first learnt five organ detectors on three detection scales for eyes, ears, lips, hands, and feet. Given a disease, we used information from the UMLS to select affected body parts, ran the pretrained organ detectors on web images, and combined the detection outputs to retrieve disease images.Results
Compared with a supervised image retrieval approach that requires training images for every disease, our ontology-guided approach exploits shared visual information of body parts across diseases. In retrieving 2220 web images of 32 diseases, we reduced manual labeling effort to 15.6% while improving the average precision by 3.9% from 77.7% to 81.6%. For 40.6% of the diseases, we improved the precision by 10%.Conclusions
The results confirm the concept that the web is a feasible source for automatic disease image retrieval for health image database construction. Our approach requires a small amount of manual effort to collect complex disease images, and to annotate them by standard medical ontology terms. 相似文献14.
Objective
Ensuring the security and appropriate use of patient health information contained within electronic medical records systems is challenging. Observing these difficulties, we present an addition to the explanation-based auditing system (EBAS) that attempts to determine the clinical or operational reason why accesses occur to medical records based on patient diagnosis information. Accesses that can be explained with a reason are filtered so that the compliance officer has fewer suspicious accesses to review manually.Methods
Our hypothesis is that specific hospital employees are responsible for treating a given diagnosis. For example, Dr Carl accessed Alice''s medical record because Hem/Onc employees are responsible for chemotherapy patients. We present metrics to determine which employees are responsible for a diagnosis and quantify their confidence. The auditing system attempts to use this responsibility information to determine the reason why an access occurred. We evaluate the auditing system''s classification quality using data from the University of Michigan Health System.Results
The EBAS correctly determines which departments are responsible for a given diagnosis. Adding this responsibility information to the EBAS increases the number of first accesses explained by a factor of two over previous work and explains over 94% of all accesses with high precision.Conclusions
The EBAS serves as a complementary security tool for personal health information. It filters a majority of accesses such that it is more feasible for a compliance officer to review the remaining suspicious accesses manually. 相似文献15.
16.
Background
Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification.Methods
We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus.Results
Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts.Conclusions
We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification.Data sharing
We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex. 相似文献17.
Elizabeth A Chrischilles Juan Pablo Hourcade William Doucette David Eichmann Brian Gryzlak Ryan Lorentzen Kara Wright Elena Letuchy Michael Mueller Karen Farris Barcey Levy 《J Am Med Inform Assoc》2014,21(4):679-686
Purpose
To examine the impact of a personal health record (PHR) on medication-use safety among older adults.Background
Online PHRs have potential as tools to manage health information. We know little about how to make PHRs accessible for older adults and what effects this will have.Methods
A PHR was designed and pretested with older adults and tested in a 6-month randomized controlled trial. After completing mailed baseline questionnaires, eligible computer users aged 65 and over were randomized 3:1 to be given access to a PHR (n=802) or serve as a standard care control group (n=273). Follow-up questionnaires measured change from baseline medication use, medication reconciliation behaviors, and medication management problems.Results
Older adults were interested in keeping track of their health and medication information. A majority (55.2%) logged into the PHR and used it, but only 16.1% used it frequently. At follow-up, those randomized to the PHR group were significantly less likely to use multiple non-steroidal anti-inflammatory drugs—the most common warning generated by the system (viewed by 23% of participants). Compared with low/non-users, high users reported significantly more changes in medication use and improved medication reconciliation behaviors, and recognized significantly more side effects, but there was no difference in use of inappropriate medications or adherence measures.Conclusions
PHRs can engage older adults for better medication self-management; however, features that motivate continued use will be needed. Longer-term studies of continued users will be required to evaluate the impact of these changes in behavior on patient health outcomes. 相似文献18.
Objective
The Department of Veterans Affairs (VA) operates one of the largest nationwide healthcare systems and is increasing use of internet technology, including development of an online personal health record system called My HealtheVet. This study examined internet use among veterans in general and particularly use of online health information among VA patients and specifically mental health service users.Methods
A nationally representative sample of 7215 veterans from the 2010 National Survey of Veterans was used. Logistic regression was employed to examine background characteristics associated with internet use and My HealtheVet.Results
71% of veterans reported using the internet and about a fifth reported using My HealtheVet. Veterans who were younger, more educated, white, married, and had higher incomes were more likely to use the internet. There was no association between background characteristics and use of My HealtheVet. Mental health service users were no less likely to use the internet or My HealtheVet than other veterans.Discussion
Most veterans are willing to access VA information online, although many VA service users do not use My HealtheVet, suggesting more education and research is needed to reduce barriers to its use.Conclusion
Although adoption of My HealtheVet has been slow, the majority of veterans, including mental health service users, use the internet and indicate a willingness to receive and interact with health information online. 相似文献19.
Objective
This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia.Design
A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification.Results
Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively.Conclusion
Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance. 相似文献20.
S Chaudhury S Sudarsanan SK Salujha K Srivastava 《Medical Journal Armed Forces India》2005,61(2):117-120