首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Objectives

The UMLS constitutes the largest existing collection of medical terms. However, little has been published about the users and uses of the UMLS. This study sheds light on these issues.

Design

We designed a questionnaire consisting of 26 questions and distributed it to the UMLS user mailing list. Participants were assured complete confidentiality of their replies. To further encourage list members to respond, we promised to provide them with early results prior to publication. Sector analysis of the responses, according to employment organizations is used to obtain insights into some responses.

Results

We received 70 responses. The study confirms two intended uses of the UMLS: access to source terminologies (75%), and mapping among them (44%). However, most access is just to a few sources, led by SNOMED, MeSH, and ICD. Out of 119 reported purposes of use, terminology research (37), information retrieval (19), and terminology translation (14) lead. Four important observations are that the UMLS is widely used as a terminology (77%), even though it was not designed as one; many users (73%) want the NLM to mark concepts with multiple parents in an indented hierarchy and to derive a terminology from the UMLS (73%). Finally, auditing the UMLS is a top budget priority (35%) for users.

Conclusions

The study reports many uses of the UMLS in a variety of subjects from terminology research to decision support and phenotyping. The study confirms that the UMLS is used to access its source terminologies and to map among them. Two primary concerns of the existing user base are auditing the UMLS and the design of a UMLS-based derived terminology.  相似文献   

2.

Objective

To study existing problem list terminologies (PLTs), and to identify a subset of concepts based on standard terminologies that occur frequently in problem list data.

Design

Problem list terms and their usage frequencies were collected from large healthcare institutions.

Measurement

The pattern of usage of the terms was analyzed. The local terms were mapped to the Unified Medical Language System (UMLS). Based on the mapped UMLS concepts, the degree of overlap between the PLTs was analyzed.

Results

Six institutions submitted 76 237 terms and their usage frequencies in 14 million patients. The distribution of usage was highly skewed. On average, 21% of unique terms already covered 95% of usage. The most frequently used 14 395 terms, representing the union of terms that covered 95% of usage in each institution, were exhaustively mapped to the UMLS. 13 261 terms were successfully mapped to 6776 UMLS concepts. Less frequently used terms were generally less ‘mappable’ to the UMLS. The mean pairwise overlap of the PLTs was only 21% (median 19%). Concepts that were shared among institutions were used eight times more often than concepts unique to one institution. A SNOMED Problem List Subset of frequently used problem list concepts was identified.

Conclusions

Most of the frequently used problem list terms could be found in standard terminologies. The overlap between existing PLTs was low. The use of the SNOMED Problem List Subset will save developmental effort, reduce variability of PLTs, and enhance interoperability of problem list data.  相似文献   

3.

Objective

To evaluate state-of-the-art unsupervised methods on the word sense disambiguation (WSD) task in the clinical domain. In particular, to compare graph-based approaches relying on a clinical knowledge base with bottom-up topic-modeling-based approaches. We investigate several enhancements to the topic-modeling techniques that use domain-specific knowledge sources.

Materials and methods

The graph-based methods use variations of PageRank and distance-based similarity metrics, operating over the Unified Medical Language System (UMLS). Topic-modeling methods use unlabeled data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database to derive models for each ambiguous word. We investigate the impact of using different linguistic features for topic models, including UMLS-based and syntactic features. We use a sense-tagged clinical dataset from the Mayo Clinic for evaluation.

Results

The topic-modeling methods achieve 66.9% accuracy on a subset of the Mayo Clinic''s data, while the graph-based methods only reach the 40–50% range, with a most-frequent-sense baseline of 56.5%. Features derived from the UMLS semantic type and concept hierarchies do not produce a gain over bag-of-words features in the topic models, but identifying phrases from UMLS and using syntax does help.

Discussion

Although topic models outperform graph-based methods, semantic features derived from the UMLS prove too noisy to improve performance beyond bag-of-words.

Conclusions

Topic modeling for WSD provides superior results in the clinical domain; however, integration of knowledge remains to be effectively exploited.  相似文献   

4.

Objective

This study has two objectives: first, to identify and characterize consumer health terms not found in the Unified Medical Language System (UMLS) Metathesaurus (2007 AB); second, to describe the procedure for creating new concepts in the process of building a consumer health vocabulary. How do the unmapped consumer health concepts relate to the existing UMLS concepts? What is the place of these new concepts in professional medical discourse?

Design

The consumer health terms were extracted from two large corpora derived in the process of Open Access Collaboratory Consumer Health Vocabulary (OAC CHV) building. Terms that could not be mapped to existing UMLS concepts via machine and manual methods prompted creation of new concepts, which were then ascribed semantic types, related to existing UMLS concepts, and coded according to specified criteria.

Results

This approach identified 64 unmapped concepts, 17 of which were labeled as uniquely “lay” and not feasible for inclusion in professional health terminologies. The remaining terms constituted potential candidates for inclusion in professional vocabularies, or could be constructed by post-coordinating existing UMLS terms. The relationship between new and existing concepts differed depending on the corpora from which they were extracted.

Conclusion

Non-mapping concepts constitute a small proportion of consumer health terms, but a proportion that is likely to affect the process of consumer health vocabulary building. We have identified a novel approach for identifying such concepts.  相似文献   

5.
6.

Background

Current image sharing is carried out by manual transportation of CDs by patients or organization-coordinated sharing networks. The former places a significant burden on patients and providers. The latter faces challenges to patient privacy.

Objective

To allow healthcare providers efficient access to medical imaging data acquired at other unaffiliated healthcare facilities while ensuring strong protection of patient privacy and minimizing burden on patients, providers, and the information technology infrastructure.

Methods

An image sharing framework is described that involves patients as an integral part of, and with full control of, the image sharing process. Central to this framework is the Patient Controlled Access-key REgistry (PCARE) which manages the access keys issued by image source facilities. When digitally signed by patients, the access keys are used by any requesting facility to retrieve the associated imaging data from the source facility. A centralized patient portal, called a PCARE patient control portal, allows patients to manage all the access keys in PCARE.

Results

A prototype of the PCARE framework has been developed by extending open-source technology. The results for feasibility, performance, and user assessments are encouraging and demonstrate the benefits of patient-controlled image sharing.

Discussion

The PCARE framework is effective in many important clinical cases of image sharing and can be used to integrate organization-coordinated sharing networks. The same framework can also be used to realize a longitudinal virtual electronic health record.

Conclusion

The PCARE framework allows prior imaging data to be shared among unaffiliated healthcare facilities while protecting patient privacy with minimal burden on patients, providers, and infrastructure. A prototype has been implemented to demonstrate the feasibility and benefits of this approach.  相似文献   

7.

Background

Computer-aided diagnosis for screening utilizes computer-based analytical methodologies to process patient information. Glaucoma is the leading irreversible cause of blindness. Due to the lack of an effective and standard screening practice, more than 50% of the cases are undiagnosed, which prevents the early treatment of the disease.

Objective

To design an automatic glaucoma diagnosis architecture automatic glaucoma diagnosis through medical imaging informatics (AGLAIA-MII) that combines patient personal data, medical retinal fundus image, and patient''s genome information for screening.

Materials and methods

2258 cases from a population study were used to evaluate the screening software. These cases were attributed with patient personal data, retinal images and quality controlled genome data. Utilizing the multiple kernel learning-based classifier, AGLAIA-MII, combined patient personal data, major image features, and important genome single nucleotide polymorphism (SNP) features.

Results and discussion

Receiver operating characteristic curves were plotted to compare AGLAIA-MII''s performance with classifiers using patient personal data, images, and genome SNP separately. AGLAIA-MII was able to achieve an area under curve value of 0.866, better than 0.551, 0.722 and 0.810 by the individual personal data, image and genome information components, respectively. AGLAIA-MII also demonstrated a substantial improvement over the current glaucoma screening approach based on intraocular pressure.

Conclusions

AGLAIA-MII demonstrates for the first time the capability of integrating patients’ personal data, medical retinal image and genome information for automatic glaucoma diagnosis and screening in a large dataset from a population study. It paves the way for a holistic approach for automatic objective glaucoma diagnosis and screening.  相似文献   

8.

Objective

With the increased routine use of advanced imaging in clinical diagnosis and treatment, it has become imperative to provide patients with a means to view and understand their imaging studies. We illustrate the feasibility of a patient portal that automatically structures and integrates radiology reports with corresponding imaging studies according to several information orientations tailored for the layperson.

Methods

The imaging patient portal is composed of an image processing module for the creation of a timeline that illustrates the progression of disease, a natural language processing module to extract salient concepts from radiology reports (73% accuracy, F1 score of 0.67), and an interactive user interface navigable by an imaging findings list. The portal was developed as a Java-based web application and is demonstrated for patients with brain cancer.

Results and discussion

The system was exhibited at an international radiology conference to solicit feedback from a diverse group of healthcare professionals. There was wide support for educating patients about their imaging studies, and an appreciation for the informatics tools used to simplify images and reports for consumer interpretation. Primary concerns included the possibility of patients misunderstanding their results, as well as worries regarding accidental improper disclosure of medical information.

Conclusions

Radiologic imaging composes a significant amount of the evidence used to make diagnostic and treatment decisions, yet there are few tools for explaining this information to patients. The proposed radiology patient portal provides a framework for organizing radiologic results into several information orientations to support patient education.  相似文献   

9.

Objective

To develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications.

Design

We developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used α-skew divergence as the similarity measure.

Measurements

The testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed.

Results

The estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively.

Conclusion

The results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.  相似文献   

10.

Objective

To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs).

Materials and methods

We processed four public medication resources, RxNorm, Side Effect Resource (SIDER) 2, MedlinePlus, and Wikipedia, to create MEDI. We applied natural language processing and ontology relationships to extract indications for prescribable, single-ingredient medication concepts and all ingredient concepts as defined by RxNorm. Indications were coded as Unified Medical Language System (UMLS) concepts and International Classification of Diseases, 9th edition (ICD9) codes. A total of 689 extracted indications were randomly selected for manual review for accuracy using dual-physician review. We identified a subset of medication–indication pairs that optimizes recall while maintaining high precision.

Results

MEDI contains 3112 medications and 63 343 medication–indication pairs. Wikipedia was the largest resource, with 2608 medications and 34 911 pairs. For each resource, estimated precision and recall, respectively, were 94% and 20% for RxNorm, 75% and 33% for MedlinePlus, 67% and 31% for SIDER 2, and 56% and 51% for Wikipedia. The MEDI high-precision subset (MEDI-HPS) includes indications found within either RxNorm or at least two of the three other resources. MEDI-HPS contains 13 304 unique indication pairs regarding 2136 medications. The mean±SD number of indications for each medication in MEDI-HPS is 6.22±6.09. The estimated precision of MEDI-HPS is 92%.

Conclusions

MEDI is a publicly available, computable resource that links medications with their indications as represented by concepts and billing codes. MEDI may benefit clinical EMR applications and reuse of EMR data for research.  相似文献   

11.

Objective

To address the problem of mapping local laboratory terminologies to Logical Observation Identifiers Names and Codes (LOINC). To study different ontology matching algorithms and investigate how the probability of term combinations in LOINC helps to increase match quality and reduce manual effort.

Materials and methods

We proposed two matching strategies: full name and multi-part. The multi-part approach also considers the occurrence probability of combined concept parts. It can further recommend possible combinations of concept parts to allow more local terms to be mapped. Three real-world laboratory databases from Taiwanese hospitals were used to validate the proposed strategies with respect to different quality measures and execution run time. A comparison with the commonly used tool, Regenstrief LOINC Mapping Assistant (RELMA) Lab Auto Mapper (LAM), was also carried out.

Results

The new multi-part strategy yields the best match quality, with F-measure values between 89% and 96%. It can automatically match 70–85% of the laboratory terminologies to LOINC. The recommendation step can further propose mapping to (proposed) LOINC concepts for 9–20% of the local terminology concepts. On average, 91% of the local terminology concepts can be correctly mapped to existing or newly proposed LOINC concepts.

Conclusions

The mapping quality of the multi-part strategy is significantly better than that of LAM. It enables domain experts to perform LOINC matching with little manual work. The probability of term combinations proved to be a valuable strategy for increasing the quality of match results, providing recommendations for proposed LOINC conepts, and decreasing the run time for match processing.  相似文献   

12.

Objective

To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.

Methods

Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.

Results

The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.

Conclusions

This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.  相似文献   

13.

Objective

As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge.

Design

The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline.

Measurements

Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set.

Results

The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second).

Conclusion

For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.  相似文献   

14.

Objective

As large-scale medical imaging studies are becoming more common, there is an increasing reliance on automated software to extract quantitative information from these images. As the size of the cohorts keeps increasing with large studies, there is a also a need for tools that allow results from automated image processing and analysis to be presented in a way that enables fast and efficient quality checking, tagging and reporting on cases in which automatic processing failed or was problematic.

Materials and methods

MilxXplore is an open source visualization platform, which provides an interface to navigate and explore imaging data in a web browser, giving the end user the opportunity to perform quality control and reporting in a user friendly, collaborative and efficient way.

Discussion

Compared to existing software solutions that often provide an overview of the results at the subject''s level, MilxXplore pools the results of individual subjects and time points together, allowing easy and efficient navigation and browsing through the different acquisitions of a subject over time, and comparing the results against the rest of the population.

Conclusions

MilxXplore is fast, flexible and allows remote quality checks of processed imaging data, facilitating data sharing and collaboration across multiple locations, and can be easily integrated into a cloud computing pipeline. With the growing trend of open data and open science, such a tool will become increasingly important to share and publish results of imaging analysis.  相似文献   

15.

Objective

Explore the automated acquisition of knowledge in biomedical and clinical documents using text mining and statistical techniques to identify disease-drug associations.

Design

Biomedical literature and clinical narratives from the patient record were mined to gather knowledge about disease-drug associations. Two NLP systems, BioMedLEE and MedLEE, were applied to Medline articles and discharge summaries, respectively. Disease and drug entities were identified using the NLP systems in addition to MeSH annotations for the Medline articles. Focusing on eight diseases, co-occurrence statistics were applied to compute and evaluate the strength of association between each disease and relevant drugs.

Results

Ranked lists of disease-drug pairs were generated and cutoffs calculated for identifying stronger associations among these pairs for further analysis. Differences and similarities between the text sources (i.e., biomedical literature and patient record) and annotations (i.e., MeSH and NLP-extracted UMLS concepts) with regards to disease-drug knowledge were observed.

Conclusion

This paper presents a method for acquiring disease-specific knowledge and a feasibility study of the method. The method is based on applying a combination of NLP and statistical techniques to both biomedical and clinical documents. The approach enabled extraction of knowledge about the drugs clinicians are using for patients with specific diseases based on the patient record, while it is also acquired knowledge of drugs frequently involved in controlled trials for those same diseases. In comparing the disease-drug associations, we found the results to be appropriate: the two text sources contained consistent as well as complementary knowledge, and manual review of the top five disease-drug associations by a medical expert supported their correctness across the diseases.  相似文献   

16.

Objective

A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records.

Materials and methods

A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier.

Results

The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7.

Discussion

Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction.

Conclusion

Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available.  相似文献   

17.

Objective

At present, most clinical data are exchanged between organizations within a regional system. However, people traveling abroad may need to visit a hospital, which would make international exchange of clinical data very useful.

Background

Since 2007, a collaborative effort to achieve clinical data sharing has been carried out at Zhejiang University in China and Kyoto University and Miyazaki University in Japan; each is running a regional clinical information center.

Methods

An international layer system named Global Dolphin was constructed with several key services, sharing patients'' health information between countries using a medical markup language (MML). The system was piloted with 39 test patients.

Results

The three regions above have records for 966 000 unique patients, which are available through Global Dolphin. Data exchanged successfully from Japan to China for the 39 study patients include 1001 MML files and 152 images. The MML files contained 197 free text-type paragraphs that needed human translation.

Discussion

The pilot test in Global Dolphin demonstrates that patient information can be shared across countries through international health data exchange. To achieve cross-border sharing of clinical data, some key issues had to be addressed: establishment of a super directory service across countries; data transformation; and unique one—language translation. Privacy protection was also taken into account. The system is now ready for live use.

Conclusion

The project demonstrates a means of achieving worldwide accessibility of medical data, by which the integrity and continuity of patients'' health information can be maintained.  相似文献   

18.

Context

TimeText is a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text.

Objective

To measure the accuracy of the TimeText for processing clinical discharge summaries.

Design

Six physicians with biomedical informatics training served as domain experts. Twenty discharge summaries were randomly selected for the evaluation. For each of the first 14 reports, 5 to 8 clinically important medical events were chosen. The temporal reasoning system generated temporal relations about the endpoints (start or finish) of pairs of medical events. Two experts (subjects) manually generated temporal relations for these medical events. The system and expert-generated results were assessed by four other experts (raters). All of the twenty discharge summaries were used to assess the system’s accuracy in answering time-oriented clinical questions. For each report, five to ten clinically plausible temporal questions about events were generated. Two experts generated answers to the questions to serve as the gold standard. We wrote queries to retrieve answers from system’s output.

Measurements

Correctness of generated temporal relations, recall of clinically important relations, and accuracy in answering temporal questions.

Results

The raters determined that 97% of subjects’ 295 generated temporal relations were correct and that 96.5% of the system’s 995 generated temporal relations were correct. The system captured 79% of 307 temporal relations determined to be clinically important by the subjects and raters. The system answered 84% of the temporal questions correctly.

Conclusion

The system encoded the majority of information identified by experts, and was able to answer simple temporal questions.  相似文献   

19.

Purpose

Several studies have explored the scientific platforms on patient use of the internet for health information. In contrast physicians'' perspective on evolving internet environment is lacking. The purpose of this study is to assess and correlate the extent of internet use among healthcare professionals and examine its effects on clinical practice.

Methods

Cross sectional survey conducted in the USA using questionnaires distributed randomly to healthcare professionals attending distinct continuing medical education programmes between 2003 and 2004. Multiple choice and yes/no questions related to the trends of internet use and its effects on clinical practice were extracted and responses analysed. The main outcome measures are self reported rates of internet use, perceived effects, and the role of medical web sites in clinical practice.

Results

The overall response rate was 60%. A total of 277 survey respondents (97%) had internet access. Some 7% in private practice and 1% of group practice physicians did not have internet access. Most (71%) used the internet regularly for medical or professional updating and 62% (n = 178) felt the need for sharing web sites designed for healthcare professionals with patients. Some 27% of the physicians currently own established personal practice web sites. Sixty three per cent have recommended a web site to a patient for more information, matching the positive trust (>70%) on the general quality of selected medical web sites.

Conclusion

This cross sectional survey shows that internet use and web based medical information is widely popular among physicians and patients. About 23%–31% of the healthcare professionals report >80% interaction with web informed patients in their daily practice.  相似文献   

20.

Background

Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge.

Objective

To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text.

Materials and methods

The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework.

Results

Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537.

Conclusions

Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号