首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 6 毫秒
1.

Background

Multimorbidity, the co-occurrence of two or more chronic medical conditions within a single individual, is increasingly becoming part of daily care of general medical practice. Literature-based discovery may help to investigate the patterns of multimorbidity and to integrate medical knowledge for improving healthcare delivery for individuals with co-occurring chronic conditions.

Objective

To explore the usefulness of literature-based discovery in primary care research through the key-case of finding associations between psychiatric and somatic diseases relevant to general practice in a large biomedical literature database (Medline).

Methods

By using literature based discovery for matching disease profiles as vectors in a high-dimensional associative concept space, co-occurrences of a broad spectrum of chronic medical conditions were matched for their potential in biomedicine. An experimental setting was chosen in parallel with expert evaluations and expert meetings to assess performance and to generate targets for integrating literature-based discovery in multidisciplinary medical research of psychiatric and somatic disease associations.

Results

Through stepwise reductions a reference set of 21 945 disease combinations was generated, from which a set of 166 combinations between psychiatric and somatic diseases was selected and assessed by text mining and expert evaluation.

Conclusions

Literature-based discovery tools generate specific patterns of associations between psychiatric and somatic diseases: one subset was appraised as promising for further research; the other subset surprised the experts, leading to intricate discussions and further eliciting of frameworks of biomedical knowledge. These frameworks enable us to specify targets for further developing and integrating literature-based discovery in multidisciplinary research of general practice, psychology and psychiatry, and epidemiology.  相似文献   

2.
基于文献的知识发现研究通过文献中潜在的间接关联的挖掘获取新的科学假设,在生物医学领域已经得到了越来越广泛的应用。间接关联性的评价是当前文献知识发现研究的一个热点问题;本文主要探讨网络特征在文献知识发现过程中间接关联的评价作用,整合共现统计信息与网络结构特征,建立了间接关联评价的新指标,这对于提高文献知识发现效率与构建知识发现系统具有重要意义。  相似文献   

3.
Objective Many tasks in natural language processing utilize lexical pattern-matching techniques, including information extraction (IE), negation identification, and syntactic parsing. However, it is generally difficult to derive patterns that achieve acceptable levels of recall while also remaining highly precise.Materials and Methods We present a multiple sequence alignment (MSA)-based technique that automatically generates patterns, thereby leveraging language usage to determine the context of words that influence a given target. MSAs capture the commonalities among word sequences and are able to reveal areas of linguistic stability and variation. In this way, MSAs provide a systemic approach to generating lexical patterns that are generalizable, which will both increase recall levels and maintain high levels of precision.Results The MSA-generated patterns exhibited consistent F1-, F.5-, and F2- scores compared to two baseline techniques for IE across four different tasks. Both baseline techniques performed well for some tasks and less well for others, but MSA was found to consistently perform at a high level for all four tasks.Discussion The performance of MSA on the four extraction tasks indicates the method’s versatility. The results show that the MSA-based patterns are able to handle the extraction of individual data elements as well as relations between two concepts without the need for large amounts of manual intervention.Conclusion We presented an MSA-based framework for generating lexical patterns that showed consistently high levels of both performance and recall over four different extraction tasks when compared to baseline methods.  相似文献   

4.

Objective

This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.

Design

The authors''approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors'' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.

Results

The authors''assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors'' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10.

Conclusion

On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores.  相似文献   

5.
Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications.Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks.Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets.Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.  相似文献   

6.
文献检索能够帮助用户快速地查找与获取目标文献,但非结构化的文献仍然需要人工阅读才能获得有效的知识,严重限制了从大量文献中发现知识的效率。基于文献的知识发现研究是通过文献集中潜在的关联发现来形成科学假设,因此利用文本挖掘技术将非结构化的文献集转化为图结构模型,对进一步知识发现的实施与深入挖掘都具有重要作用。有鉴于此,我们利用复杂网络的方法对文献集进行文本挖掘,探讨了关联知识的图结构组织对文献知识发现的重要作用,揭示了非相关文献中的隐含知识,并构建了文献信息网络的知识发现应用模型。  相似文献   

7.
文本挖掘能从海量的中医药文献中发现知识以促进中医临床研究和中药研发。本文总结现有研究指出文本分类和信息抽取是中医药文献知识发现的关键技术,指出中医药文本分类、非关联知识发现和中医药文献信息抽取为三个主要研究方向,并论述了三个研究领域中需解决的关键问题和研究方向,最后展望文本挖掘在中医药学科的应用前景,指出非关联文献知识将成为中西医结合研究的热点。  相似文献   

8.
DNA微阵列技术是近年来发展起来的一种功能基因组学研究技术.利用DNA微阵列技术,可以从基因组水平上鉴定出相关的功能基因及其表达调控网络,有助于阐明这些基因的生物学功能及其机理.结合文本挖掘的相关研究结果.探讨了DNA微阵列技术的原理和特点,数据的分析和解释,基因的聚类方法和基于文献的DNA微阵列分析.通过基于文献的微阵列分析方法,找出隐舍的、具有语义关联的生物概念,并进行推理,发现隐性的新知识.具体阐述了基于统计、基于自然语言处理、基于关联规则挖掘,基于模式识别的4种分析方法.基于文本挖掘的DNA微阵列技术,有利于发现基因或蛋白质之间的相互作用关系,自动识别生物学名词,提高数据分析效率等.  相似文献   

9.
10.
Background Asthma is a heterogeneous disease for which a strong genetic basis has been firmly established.Until now no studies have been undertaken to systemically explore the network of asthma-related...  相似文献   

11.

Objective

Negation is a linguistic phenomenon that marks the absence of an entity or event. Negated events are frequently reported in both biological literature and clinical notes. Text mining applications benefit from the detection of negation and its scope. However, due to the complexity of language, identifying the scope of negation in a sentence is not a trivial task.

Design

Conditional random fields (CRF), a supervised machine-learning algorithm, were used to train models to detect negation cue phrases and their scope in both biological literature and clinical notes. The models were trained on the publicly available BioScope corpus.

Measurement

The performance of the CRF models was evaluated on identifying the negation cue phrases and their scope by calculating recall, precision and F1-score. The models were compared with four competitive baseline systems.

Results

The best CRF-based model performed statistically better than all baseline systems and NegEx, achieving an F1-score of 98% and 95% on detecting negation cue phrases and their scope in clinical notes, and an F1-score of 97% and 85% on detecting negation cue phrases and their scope in biological literature.

Conclusions

This approach is robust, as it can identify negation scope in both biological and clinical text. To benefit text mining applications, the system is publicly available as a Java API and as an online application at http://negscope.askhermes.org.  相似文献   

12.

Objective

Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text.

Materials and Methods

Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (∼112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (∼1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores.

Results and Conclusion

Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.  相似文献   

13.

Objective

Information extraction and classification of clinical data are current challenges in natural language processing. This paper presents a cascaded method to deal with three different extractions and classifications in clinical data: concept annotation, assertion classification and relation classification.

Materials and Methods

A pipeline system was developed for clinical natural language processing that includes a proofreading process, with gold-standard reflexive validation and correction. The information extraction system is a combination of a machine learning approach and a rule-based approach. The outputs of this system are used for evaluation in all three tiers of the fourth i2b2/VA shared-task and workshop challenge.

Results

Overall concept classification attained an F-score of 83.3% against a baseline of 77.0%, the optimal F-score for assertions about the concepts was 92.4% and relation classifier attained 72.6% for relationships between clinical concepts against a baseline of 71.0%. Micro-average results for the challenge test set were 81.79%, 91.90% and 70.18%, respectively.

Discussion

The challenge in the multi-task test requires a distribution of time and work load for each individual task so that the overall performance evaluation on all three tasks would be more informative rather than treating each task assessment as independent. The simplicity of the model developed in this work should be contrasted with the very large feature space of other participants in the challenge who only achieved slightly better performance. There is a need to charge a penalty against the complexity of a model as defined in message minimalisation theory when comparing results.

Conclusion

A complete pipeline system for constructing language processing models that can be used to process multiple practical detection tasks of language structures of clinical records is presented.  相似文献   

14.

Objective

A system that translates narrative text in the medical domain into structured representation is in great demand. The system performs three sub-tasks: concept extraction, assertion classification, and relation identification.

Design

The overall system consists of five steps: (1) pre-processing sentences, (2) marking noun phrases (NPs) and adjective phrases (APs), (3) extracting concepts that use a dosage-unit dictionary to dynamically switch two models based on Conditional Random Fields (CRF), (4) classifying assertions based on voting of five classifiers, and (5) identifying relations using normalized sentences with a set of effective discriminating features.

Measurements

Macro-averaged and micro-averaged precision, recall and F-measure were used to evaluate results.

Results

The performance is competitive with the state-of-the-art systems with micro-averaged F-measure of 0.8489 for concept extraction, 0.9392 for assertion classification and 0.7326 for relation identification.

Conclusions

The system exploits an array of common features and achieves state-of-the-art performance. Prudent feature engineering sets the foundation of our systems. In concept extraction, we demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly. In assertion classification, a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible. These classes would suffer from data scarcity in conventional machine-learning methods. In relation identification, we use two-staged architecture, the second of which applies pairwise classifiers to possible candidate classes. This architecture significantly improves performance.  相似文献   

15.

Background

Automated text summarisers that find the best clinical evidence reported in collections of medical literature are of potential benefit for the practice of Evidence Based Medicine (EBM). Research and development of text summarisers for EBM, however, is impeded by the lack of corpora to train and test such systems.

Aims

To produce a corpus for research in EBM summarisation.

Method

We sourced the “Clinical Inquiries” section of the Journal of Family Practice (JFP) and obtained a sizeable sample of questions and evidence based summaries. We further processed the summaries by combining automated techniques, human annotations, and crowdsourcing techniques to identify the PubMed IDs of the references.

Results

The corpus has 456 questions, 1,396 answer components, 3,036 answer justifications, and 2,908 references.

Conclusion

The corpus is now available for the research community at http://sourceforge.net/projects/ebmsumcorpus.  相似文献   

16.
文本挖掘在生物医学领域中的应用及其系统工具   总被引:4,自引:2,他引:2       下载免费PDF全文
系统介绍了生物医学文本挖掘的具体流程和文本挖掘技术在生物医学领域中的应用情况,并着重从自然语言处理和本体、命名实体识别、关系抽取、文本分类与聚类、共现分析、系统工具及评价、可视化等方面分别做了阐述.  相似文献   

17.
为了及时发现并准确识别前沿知识,辅助科研人员及决策者更好地开展相关工作,根据现有研究成果,确定了前沿知识的多维度特征(生命周期特征、普遍认可性、权威性、创新性和学科交叉性)及其测度方法,提出了医学领域前沿知识发现的研究框架,并对框架中的关键问题进行了解读,以期为后续医学领域开展前沿知识探测研究提供参考。  相似文献   

18.
纳米出版物(Nanopublication)是一种新的语义表达模式。通过对Nanopublication概念、源文件及语义模式的研究,从知识发现与Nanopublication的产生、改进、发展三方面探讨了Nanopublication与知识发现之间的联系,以便为研究者提供新线索,发现新知识。  相似文献   

19.

Objective

Narratives of electronic medical records contain information that can be useful for clinical practice and multi-purpose research. This information needs to be put into a structured form before it can be used by automated systems. Coreference resolution is a step in the transformation of narratives into a structured form.

Methods

This study presents a medical coreference resolution system (MCORES) for noun phrases in four frequently used clinical semantic categories: persons, problems, treatments, and tests. MCORES treats coreference resolution as a binary classification task. Given a pair of concepts from a semantic category, it determines coreferent pairs and clusters them into chains. MCORES uses an enhanced set of lexical, syntactic, and semantic features. Some MCORES features measure the distance between various representations of the concepts in a pair and can be asymmetric.

Results and Conclusion

MCORES was compared with an in-house baseline that uses only single-perspective ‘token overlap’ and ‘number agreement’ features. MCORES was shown to outperform the baseline; its enhanced features contribute significantly to performance. In addition to the baseline, MCORES was compared against two available third-party, open-domain systems, RECONCILEACL09 and the Beautiful Anaphora Resolution Toolkit (BART). MCORES was shown to outperform both of these systems on clinical records.  相似文献   

20.

Objective

To evaluate existing automatic speech-recognition (ASR) systems to measure their performance in interpreting spoken clinical questions and to adapt one ASR system to improve its performance on this task.

Design and measurements

The authors evaluated two well-known ASR systems on spoken clinical questions: Nuance Dragon (both generic and medical versions: Nuance Gen and Nuance Med) and the SRI Decipher (the generic version SRI Gen). The authors also explored language model adaptation using more than 4000 clinical questions to improve the SRI system''s performance, and profile training to improve the performance of the Nuance Med system. The authors reported the results with the NIST standard word error rate (WER) and further analyzed error patterns at the semantic level.

Results

Nuance Gen and Med systems resulted in a WER of 68.1% and 67.4% respectively. The SRI Gen system performed better, attaining a WER of 41.5%. After domain adaptation with a language model, the performance of the SRI system improved 36% to a final WER of 26.7%.

Conclusion

Without modification, two well-known ASR systems do not perform well in interpreting spoken clinical questions. With a simple domain adaptation, one of the ASR systems improved significantly on the clinical question task, indicating the importance of developing domain/genre-specific ASR systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号