首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interpretation of semantic propositions in free-text documents such as MEDLINE citations would provide valuable support for biomedical applications, and several approaches to semantic interpretation are being pursued in the biomedical informatics community. In this paper, we describe a methodology for interpreting linguistic structures that encode hypernymic propositions, in which a more specific concept is in a taxonomic relationship with a more general concept. In order to effectively process these constructions, we exploit underspecified syntactic analysis and structured domain knowledge from the Unified Medical Language System (UMLS). After introducing the syntactic processing on which our system depends, we focus on the UMLS knowledge that supports interpretation of hypernymic propositions. We first use semantic groups from the Semantic Network to ensure that the two concepts involved are compatible; hierarchical information in the Metathesaurus then determines which concept is more general and which more specific. A preliminary evaluation of a sample based on the semantic group Chemicals and Drugs provides 83% precision. An error analysis was conducted and potential solutions to the problems encountered are presented. The research discussed here serves as a paradigm for investigating the interaction between domain knowledge and linguistic structure in natural language processing, and could also make a contribution to research on automatic processing of discourse structure. Additional implications of the system we present include its integration in advanced semantic interpretation processors for biomedical text and its use for information extraction in specific domains. The approach has the potential to support a range of applications, including information retrieval and ontology engineering.  相似文献   

2.
OBJECTIVE: The National Library of Medicine's Unified Medical Language System (UMLS) is a rich source of knowledge in the biomedical domain. The UMLS is used for research and development across wide range of different applications. In this paper, we evaluated the coverage of UMLS as compared with medical terms extracted from Korean medical records and identified differences in concept representation between two terminology sets. DESIGN AND MEASUREMENT: We measured the concept coverage of the UMLS. For this study, we mapped the clinical terms extracted from the discharge records of Seoul National University Hospital (SNUH) to the UMLS. RESULTS: Thirty-five percent of the entry terms used in chief complaint of SNUH were conceptually matched with the UMLS 'Sign or Symptom' concepts. Fifty-eight percent of the terms were found to be matched with the UMLS 'Disease or Syndrome' concept rather than the 'Sign or Symptom' concept. The remaining 7% were not found in the UMLS concepts. We then analyzed some of different expression patterns used by the two term sets and addressed issues to be taken into consideration. CONCLUSION: We found out that the UMLS was comparable with Korean medical records, since most of concepts of Korean medical records were covered with the UMLS concepts.  相似文献   

3.
Objective:Automatic summarization of biomedical literature usually relies on domain knowledge from external sources to build rich semantic representations of the documents to be summarized. In this paper, we investigate the impact of the knowledge source used on the quality of the summaries that are generated.Materials and methods:We present a method for representing a set of documents relevant to a given biological entity or topic as a semantic graph of domain concepts and relations. Different graphs are created by using different combinations of ontologies and vocabularies within the UMLS (including GO, SNOMED-CT, HUGO and all available vocabularies in the UMLS) to retrieve domain concepts, and different types of relationships (co-occurrence and semantic relations from the UMLS Metathesaurus and Semantic Network) are used to link the concepts in the graph. The different graphs are next used as input to a summarization system that produces summaries composed of the most relevant sentences from the original documents.Results and conclusions:Our experiments demonstrate that the choice of the knowledge source used to model the text has a significant impact on the quality of the automatic summaries. In particular, we find that, when summarizing gene-related literature, using GO, SNOMED-CT and HUGO to extract domain concepts results in significantly better summaries than using all available vocabularies in the UMLS. This finding suggests that successful biomedical summarization requires the selection of the appropriate knowledge source, whose coverage, specificity and relations must be in accordance to the type of the documents to summarize.  相似文献   

4.
ObjectivesThis paper proposes a novel semantic method for auditing associative relations in biomedical terminologies. We tested our methodology on two Unified Medical Language System (UMLS) knowledge sources.MethodsWe use the UMLS semantic groups as high-level representations of the domain and range of relationships in the Metathesaurus and in the Semantic Network. A mapping created between Metathesaurus relationships and Semantic Network relationships forms the basis for comparing the signatures of a given Metathesaurus relationship to the signatures of the semantic relationship to which it is mapped. The consistency of Metathesaurus relations is studied for each relationship.ResultsOf the 177 associative relationships in the Metathesaurus, 84 (48%) exhibit a high-degree of consistency with the corresponding Semantic Network relationships. Overall, 63% of the 1.8 M associative relations in the Metathesaurus are consistent with relations in the Semantic Network.ConclusionThe semantics of associative relationships in biomedical terminologies should be defined explicitly by their developers. The Semantic Network would benefit from being extended with new relationships and with new relations for some existing relationships. The UMLS editing environment could take advantage of the correspondence established between relationships in the Metathesaurus and the Semantic Network. Finally, the auditing method also yielded useful information for refining the mapping of associative relationships between the two sources.  相似文献   

5.
Ontologies are widely used for formalizing and organizing the knowledge of a particular domain of interest. This facilitates knowledge sharing and re-use by both people and systems. Ontologies are becoming increasingly important in the biomedical domain since they enable knowledge sharing in a formal, homogeneous and unambiguous way. Knowledge in a rapidly growing field such as biomedicine is usually evolving and therefore an ontology maintenance process is required to keep ontological knowledge up-to-date. This work presents our methodology for building a formally defined ontology, maintaining it exploiting machine learning techniques and domain specific corpora, and evaluating it using a well-defined experimental setting. The application of this methodology in the allergen domain is then discussed in detail presenting the ontology built, the specific techniques used and the evaluation settings.  相似文献   

6.
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.  相似文献   

7.
IntroductionA common bottleneck during ontology evaluation is knowledge acquisition from domain experts for gold standard creation. This paper contributes a novel semi-automated method for evaluating the concept coverage and accuracy of biomedical ontologies by complementing expert knowledge with knowledge automatically extracted from clinical practice guidelines and electronic health records, which minimizes reliance on expensive domain expertise for gold standards generation.MethodsWe developed a bacterial clinical infectious diseases ontology (BCIDO) to assist clinical infectious disease treatment decision support. Using a semi-automated method we integrated diverse knowledge sources, including publically available infectious disease guidelines from international repositories, electronic health records, and expert-generated infectious disease case scenarios, to generate a compendium of infectious disease knowledge and use it to evaluate the accuracy and coverage of BCIDO.ResultsBCIDO has three classes (i.e., infectious disease, antibiotic, bacteria) containing 593 distinct concepts and 2345 distinct concept relationships. Our semi-automated method generated an ID knowledge compendium consisting of 637 concepts and 1554 concept relationships. Overall, BCIDO covered 79% (504/637) of the concepts and 89% (1378/1554) of the concept relationships in the ID compendium. BCIDO coverage of ID compendium concepts was 92% (121/131) for antibiotic, 80% (205/257) for infectious disease, and 72% (178/249) for bacteria. The low coverage of bacterial concepts in BCIDO was due to a difference in concept granularity between BCIDO and infectious disease guidelines. Guidelines and expert generated scenarios were the richest source of ID concepts and relationships while patient records provided relatively fewer concepts and relationships.ConclusionsOur semi-automated method was cost-effective for generating a useful knowledge compendium with minimal reliance on domain experts. This method can be useful for continued development and evaluation of biomedical ontologies for better accuracy and coverage.  相似文献   

8.
Domain reference ontologies represent knowledge about a particular part of the world in a way that is independent from specific objectives, through a theory of the domain. An example of reference ontology in biomedical informatics is the Foundational Model of Anatomy (FMA), an ontology of anatomy that covers the entire range of macroscopic, microscopic, and subcellular anatomy. The purpose of this paper is to explore how two domain reference ontologies--the FMA and the Chemical Entities of Biological Interest (ChEBI) ontology, can be used (i) to align existing terminologies, (ii) to infer new knowledge in ontologies of more complex entities, and (iii) to manage and help reasoning about individual data. We analyze those kinds of usages of these two domain reference ontologies and suggest desiderata for reference ontologies in biomedicine. While a number of groups and communities have investigated general requirements for ontology design and desiderata for controlled medical vocabularies, we are focusing on application purposes. We suggest five desirable characteristics for reference ontologies: good lexical coverage, good coverage in terms of relations, compatibility with standards, modularity, and ability to represent variation in reality.  相似文献   

9.
ONTOFUSION: ontology-based integration of genomic and clinical databases   总被引:1,自引:0,他引:1  
ONTOFUSION is an ontology-based system designed for biomedical database integration. It is based on two processes: mapping and unification. Mapping is a semi-automated process that uses ontologies to link a database schema with a conceptual framework-named virtual schema. There are three methodologies for creating virtual schemas, according to the origin of the domain ontology used: (1) top-down--e.g. using an existing ontology, such as the UMLS or Gene Ontology--, (2) bottom-up--building a new domain ontology-- and (3) a hybrid combination. Unification is an automated process for integrating ontologies and hence the database to which they are linked. Using these methods, we employed ONTOFUSION to integrate a large number of public genomic and clinical databases, as well as biomedical ontologies.  相似文献   

10.
Traditional Chinese medicine (TCM) as a complete knowledge system researches into human health conditions via a different approach compared to orthodox medicine. We are developing a unified traditional Chinese medical language system (UTCMLS) through an ontology approach that will support TCM language knowledge storage, concept-based information retrieval and information integration. UTCMLS is a huge knowledge project, which is a broad collaboration of 16 distributed groups, most of them with no prior experience of formal ontology development. Therefore, the cooperative and comprehensive ontology engineering is crucial. We use Protégé 2000 for ontology development of concepts and relationships that represent the domain and that will permit storage of TCM knowledge. This paper focuses on the methodology, design and development of ontology for UTCMLS.  相似文献   

11.
Ontologies are useful tools for sharing and exchanging knowledge. However ontology construction is complex and often time consuming. In this paper, we present a method for building a bilingual domain ontology from textual and termino-ontological resources intended for semantic annotation and information retrieval of textual documents. This method combines two approaches: ontology learning from texts and the reuse of existing terminological resources. It consists of four steps: (i) term extraction from domain specific corpora (in French and English) using textual analysis tools, (ii) clustering of terms into concepts organized according to the UMLS Metathesaurus, (iii) ontology enrichment through the alignment of French and English terms using parallel corpora and the integration of new concepts, (iv) refinement and validation of results by domain experts. These validated results are formalized into a domain ontology dedicated to Alzheimer’s disease and related syndromes which is available online (http://lesim.isped.u-bordeaux2.fr/SemBiP/ressources/ontoAD.owl). The latter currently includes 5765 concepts linked by 7499 taxonomic relationships and 10,889 non-taxonomic relationships. Among these results, 439 concepts absent from the UMLS were created and 608 new synonymous French terms were added. The proposed method is sufficiently flexible to be applied to other domains.  相似文献   

12.
In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.  相似文献   

13.
In this study we report on potential drug–drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug–drug interaction schemas, based on relationships extracted from SemMedDB. In the first schema, Drug1 and Drug2 interact through Drug1’s effect on some gene, which in turn affects Drug2. In the second, Drug1 affects Gene1, while Drug2 affects Gene2. Gene1 and Gene2, together, then have an effect on some biological function. After checking each drug pair from the medication lists of each of 22 patients, we found 19 known and 62 unknown drug–drug interactions using both schemas. For example, our results suggest that the interaction of Lisinopril, an ACE inhibitor commonly prescribed for hypertension, and the antidepressant sertraline can potentially increase the likelihood and possibly the severity of psoriasis. We also assessed the relationships extracted by SemRep from a linguistic perspective and found that the precision of SemRep was 0.58 for 300 randomly selected sentences from MEDLINE. Our study demonstrates that the use of structured knowledge in the form of relationships from the biomedical literature can support the discovery of potential drug–drug interactions occurring in patient clinical data. Moreover, SemMedDB provides a good knowledge resource for expanding the range of drugs, genes, and biological functions considered as elements in various drug–drug interaction pathways.  相似文献   

14.
The Foundational Model of Anatomy (FMA), initially developed as an enhancement of the anatomical content of UMLS, is a domain ontology of the concepts and relationships that pertain to the structural organization of the human body. It encompasses the material objects from the molecular to the macroscopic levels that constitute the body and associates with them non-material entities (spaces, surfaces, lines, and points) required for describing structural relationships. The disciplined modeling approach employed for the development of the FMA relies on a set of declared principles, high level schemes, Aristotelian definitions and a frame-based authoring environment. We propose the FMA as a reference ontology in biomedical informatics for correlating different views of anatomy, aligning existing and emerging ontologies in bioinformatics ontologies and providing a structure-based template for representing biological functions.  相似文献   

15.
The estimation of the semantic similarity between terms provides a valuable tool to enable the understanding of textual resources. Many semantic similarity computation paradigms have been proposed both as general-purpose solutions or framed in concrete fields such as biomedicine. In particular, ontology-based approaches have been very successful due to their efficiency, scalability, lack of constraints and thanks to the availability of large and consensus ontologies (like WordNet or those in the UMLS). These measures, however, are hampered by the fact that only one ontology is exploited and, hence, their recall depends on the ontological detail and coverage. In recent years, some authors have extended some of the existing methodologies to support multiple ontologies. The problem of integrating heterogeneous knowledge sources is tackled by means of simple terminological matchings between ontological concepts. In this paper, we aim to improve these methods by analysing the similarity between the modelled taxonomical knowledge and the structure of different ontologies. As a result, we are able to better discover the commonalities between different ontologies and hence, improve the accuracy of the similarity estimation. Two methods are proposed to tackle this task. They have been evaluated and compared with related works by means of several widely-used benchmarks of biomedical terms using two standard ontologies (WordNet and MeSH). Results show that our methods correlate better, compared to related works, with the similarity assessments provided by experts in biomedicine.  相似文献   

16.
ObjectiveSystematic Reviews (SRs) are utilized to summarize evidence from high quality studies and are considered the preferred source of evidence-based practice (EBP). However, conducting SRs can be time and labor intensive due to the high cost of article screening. In previous studies, we demonstrated utilizing established (lexical) article relationships to facilitate the identification of relevant articles in an efficient and effective manner. Here we propose to enhance article relationships with background semantic knowledge derived from Unified Medical Language System (UMLS) concepts and ontologies.MethodsWe developed a pipelined semantic concepts representation process to represent articles from an SR into an optimized and enriched semantic space of UMLS concepts. Throughout the process, we leveraged concepts and concept relations encoded in biomedical ontologies (SNOMED-CT and MeSH) within the UMLS framework to prompt concept features of each article. Article relationships (similarities) were established and represented as a semantic article network, which was readily applied to assist with the article screening process. We incorporated the concept of active learning to simulate an interactive article recommendation process, and evaluated the performance on 15 completed SRs. We used work saved over sampling at 95% recall (WSS95) as the performance measure.ResultsWe compared the WSS95 performance of our ontology-based semantic approach to existing lexical feature approaches and corpus-based semantic approaches, and found that we had better WSS95 in most SRs. We also had the highest average WSS95 of 43.81% and the highest total WSS95 of 657.18%.ConclusionWe demonstrated using ontology-based semantics to facilitate the identification of relevant articles for SRs. Effective concepts and concept relations derived from UMLS ontologies can be utilized to establish article semantic relationships. Our approach provided a promising performance and can easily apply to any SR topics in the biomedical domain with generalizability.  相似文献   

17.
目的乳腺超声图像本体有助于乳腺超声图像语义标注、智能检索等。本文以乳腺超声图像为例,论述了乳腺超声图像的本体模型构建方法。方法首先通过主题词与语料高频词结合的方法确定乳腺超声图像本体的概念,然后借鉴UMLS提炼乳腺超声图像本体的语义关系。结果本研究构建的乳腺超声图像本体具有1274个概念,56种语义关系,通过PROGTéGé构建了乳腺超声图像本体。结论以主题词与语料高频词结合的方法确定的本体概念具有较好的乳腺超声图像语义刻画效果,本文所述的乳腺超声图像本体构建方法也适用于其他领域本体的构建。  相似文献   

18.

Background  

Large biomedical data sets have become increasingly important resources for medical researchers. Modern biomedical data sets are annotated with standard terms to describe the data and to support data linking between databases. The largest curated listing of biomedical terms is the the National Library of Medicine's Unified Medical Language System (UMLS). The UMLS contains more than 2 million biomedical terms collected from nearly 100 medical vocabularies. Many of the vocabularies contained in the UMLS carry restrictions on their use, making it impossible to share or distribute UMLS-annotated research data. However, a subset of the UMLS vocabularies, designated Category 0 by UMLS, can be used to annotate and share data sets without violating the UMLS License Agreement.  相似文献   

19.
Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients’ problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications.  相似文献   

20.
ObjectiveDisease-specific vocabularies are fundamental to many knowledge-based intelligent systems and applications like text annotation, cohort selection, disease diagnostic modeling, and therapy recommendation. Reference standards are critical in the development and validation of automated methods for disease-specific vocabularies. The goal of the present study is to design and test a generalizable method for the development of vocabulary reference standards from expert-curated, disease-specific biomedical literature resources.MethodsWe formed disease-specific corpora from literature resources like textbooks, evidence-based synthesized online sources, clinical practice guidelines, and journal articles. Medical experts annotated and adjudicated disease-specific terms in four classes (i.e., causes or risk factors, signs or symptoms, diagnostic tests or results, and treatment). Annotations were mapped to UMLS concepts. We assessed source variation, the contribution of each source to build disease-specific vocabularies, the saturation of the vocabularies with respect to the number of used sources, and the generalizability of the method with different diseases.ResultsThe study resulted in 2588 string-unique annotations for heart failure in four classes, and 193 and 425 respectively for pulmonary embolism and rheumatoid arthritis in treatment class. Approximately 80% of the annotations were mapped to UMLS concepts. The agreement among heart failure sources ranged between 0.28 and 0.46. The contribution of these sources to the final vocabulary ranged between 18% and 49%. With the sources explored, the heart failure vocabulary reached near saturation in all four classes with the inclusion of minimal six sources (or between four to seven sources if only counting terms occurred in two or more sources). It took fewer sources to reach near saturation for the other two diseases in terms of the treatment class.ConclusionsWe developed a method for the development of disease-specific reference vocabularies. Expert-curated biomedical literature resources are substantial for acquiring disease-specific medical knowledge. It is feasible to reach near saturation in a disease-specific vocabulary using a relatively small number of literature sources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号