首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 62 毫秒
1.
文本挖掘在生物医学领域中的应用及其系统工具   总被引:4,自引:2,他引:2       下载免费PDF全文
系统介绍了生物医学文本挖掘的具体流程和文本挖掘技术在生物医学领域中的应用情况,并着重从自然语言处理和本体、命名实体识别、关系抽取、文本分类与聚类、共现分析、系统工具及评价、可视化等方面分别做了阐述.  相似文献   

2.
针对医学领域采用自举进行关系抽取的研究较少且国内面向医学领域的基础工具缺失问题,在一般自然语言处理技术的基础上,采用自举的算法框架,以最短依存路径构建关系模式,在过滤机制中引入候选实体的正向性评价,介绍新的算法优化策略,通过试验评价系统的性能,总结本研究的贡献与局限。  相似文献   

3.
生物医学文本挖掘:步骤与工具   总被引:1,自引:0,他引:1       下载免费PDF全文
介绍了生物医学领域里的文本挖掘研究的步骤及各个步骤中所采用的方法,重点介绍了各个步骤中所用的工具和案例,以期促进生物医学文本挖掘研究的开展。  相似文献   

4.
分析中文电子病历数据实体关系提取常用方法,提出一种基于双向编码器表征的实体关系联合抽取算法,使用级联解码器以及指针标注方法完成实体关系抽取及实体识别,实验结果证明该方法可有效抽取电子病历实体关系。  相似文献   

5.
目的:利用深度学习方法自动抽取中文生物医学文本中的开放式概念关系,以增强生物医学文本理解及医学知识网络构建。方法:使用BiLSTM-CRF模型从中文生物医学文献数据中抽取以句子上下文短语描述的开放式概念关系,并与基于条件随机场(Conditional Random Fields,CRF)和基于长短时记忆网络(Long Short-Term Memory,LSTM)的方法进行对比分析。结果:基于BiLSTM-CRF的中文生物医学开放式概念关系抽取方法取得F1值为0.5221,显著高于基于CRF模型的方法(F1值为0.2353)和基于LSTM模型的方法(F1值为0.3355)。结论:与单独使用CRF模型或LSTM模型的方法相比,基于BiLSTM-CRF的开放式概念关系抽取方法具有更好的鲁棒性和泛化性,对于生物医学文本理解、医学知识网络构建等研究具有借鉴意义。  相似文献   

6.
介绍国内外生物医学命名实体识别的研究现状,详细阐述生物医学命名实体识别的技术方法,包括基于词典和规则的方法、基于机器学习的方法、混合方法和神经网络方法以及相关测评组织和标准,总结中文生物医学命名实体识别难点和意义。  相似文献   

7.
对生物医学文本研究背景进行了概述,并介绍了两种生物医学文本挖掘工具——COREMINE medical和Chilibot,在此基础上利用这两种工具对白血病和基因的相互作用关系进行探讨,最终得出具体的相互作用关系的结论。  相似文献   

8.
介绍命名实体识别及模型应用研究情况,以中医典籍作为数据源,采用深度学习方法,进行中医疾病、方剂、中草药等实体抽取,设计BiLSTM-CRF序列标注模型,构建中医典籍实验语料进行实验,结果表明该模型算法具有高度准确性。  相似文献   

9.
以2015-2016年两届国际生物医学审编会议上领域专家讨论的报告和会议论文集,以及PubMedCentral中近5年来有关Biocuration和Data Biocuration的研究文献为数据源,采用内容分析法对生物医学科学数据审编的研究主题进行分析、归纳和总结,重点从Biocuration的工作机制、生物医学数据标准构建与应用、集成与可视化、审编与应用、生物医学文本挖掘等方面进行梳理,以期为我国生物医学科学数据审编的发展提供国际经验。  相似文献   

10.
介绍3类国内外生物医学领域本体网络整合工具的研究成果,包括生物医学本体网络整合平台、疾病-药物本体知识发现工具、基因-蛋白质本体集成分析工具,分析其特点及不足,总结本体整合工具开发过程中应该注意的问题,希望能为相关研究者提供借鉴。  相似文献   

11.
ObjectiveThere have been various methods to deal with the erroneous training data in distantly supervised relation extraction (RE), however, their performance is still far from satisfaction. We aimed to deal with the insufficient modeling problem on instance-label correlations for predicting biomedical relations using deep learning and reinforcement learning.Materials and MethodsIn this study, a new computational model called piecewise attentive convolutional neural network and reinforcement learning (PACNN+RL) was proposed to perform RE on distantly supervised data generated from Unified Medical Language System with MEDLINE abstracts and benchmark datasets. In PACNN+RL, PACNN was introduced to encode semantic information of biomedical text, and the RL method with memory backtracking mechanism was leveraged to alleviate the erroneous data issue. Extensive experiments were conducted on 4 biomedical RE tasks.ResultsThe proposed PACNN+RL model achieved competitive performance on 8 biomedical corpora, outperforming most baseline systems. Specifically, PACNN+RL outperformed all baseline methods with the F1-score of 0.5592 on the may-prevent dataset, 0.6666 on the may-treat dataset, and 0.3838 on the DDI corpus, 2011. For the protein-protein interaction RE task, we obtained new state-of-the-art performance on 4 out of 5 benchmark datasets.ConclusionsThe performance on many distantly supervised biomedical RE tasks was substantially improved, primarily owing to the denoising effect of the proposed model. It is anticipated that PACNN+RL will become a useful tool for large-scale RE and other downstream tasks to facilitate biomedical knowledge acquisition. We also made the demonstration program and source code publicly available at http://112.74.48.115:9000/.  相似文献   

12.
随着信息技术的发展,采集、存储和管理数据的手段日益完善,数据挖掘学科应运而生。文章阐述数据挖掘的概念;通过给出各种数据挖掘方法在生物医学研究领域中的应用实例,分析数据挖掘与生物医学领域中统计学的关系,并就国内生物医学数据挖掘的应用现状、需要解决的问题以及今后研究的发展方向等进行综述。  相似文献   

13.
随着信息技术的发展,采集、存储和管理数据的手段日益完善,数据挖掘学科应运而生。文章阐述数据挖掘的概念;通过给出各种数据挖掘方法在生物医学研究领域中的应用实例,分析数据挖掘与生物医学领域中统计学的关系,并就国内生物医学数据挖掘的应用现状、需要解决的问题以及今后研究的发展方向等进行综述。  相似文献   

14.

Objective

Identification of clinical events (eg, problems, tests, treatments) and associated temporal expressions (eg, dates and times) are key tasks in extracting and managing data from electronic health records. As part of the i2b2 2012 Natural Language Processing for Clinical Data challenge, we developed and evaluated a system to automatically extract temporal expressions and events from clinical narratives. The extracted temporal expressions were additionally normalized by assigning type, value, and modifier.

Materials and methods

The system combines rule-based and machine learning approaches that rely on morphological, lexical, syntactic, semantic, and domain-specific features. Rule-based components were designed to handle the recognition and normalization of temporal expressions, while conditional random fields models were trained for event and temporal recognition.

Results

The system achieved micro F scores of 90% for the extraction of temporal expressions and 87% for clinical event extraction. The normalization component for temporal expressions achieved accuracies of 84.73% (expression''s type), 70.44% (value), and 82.75% (modifier).

Discussion

Compared to the initial agreement between human annotators (87–89%), the system provided comparable performance for both event and temporal expression mining. While (lenient) identification of such mentions is achievable, finding the exact boundaries proved challenging.

Conclusions

The system provides a state-of-the-art method that can be used to support automated identification of mentions of clinical events and temporal expressions in narratives either to support the manual review process or as a part of a large-scale processing of electronic health databases.  相似文献   

15.
16.
目的:构建中文生物医学实体及关系的自动识别标注平台,为中文生物医学语料标注和精准医学语料积累及知识服务等提供参考。方法:基于词典和CRF算法实现中文生物医学文本的自动实体识别,利用Python、JavaScript、CSS等编程语言和Query框架等相关工具构建中文生物医学实体自动标注平台。结果:构建了一个可以自动识别中文实体且具备上传、标注、审核文本并最终存储文本等功能的中文自动标注平台。该平台能高效、准确地识别文本内容,实现自动标注。结论:该平台具备了人工导入文献、标注、管理员审核结算的功能,可以为生物医学领域的研究者进行信息的数据挖掘、中文语料库的构建提供支持。  相似文献   

17.
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号