首页 | 本学科首页   官方微博 | 高级检索  
检索        

基于多特征融合的线性内核SVM法挖掘生物实体关联
引用本文:魏星,胡德华,易敏寒,常雪莲,杨小迪,朱文婕.基于多特征融合的线性内核SVM法挖掘生物实体关联[J].中国生物医学工程学报,2018,37(4):451-460.
作者姓名:魏星  胡德华  易敏寒  常雪莲  杨小迪  朱文婕
作者单位:1(蚌埠医学院公共基础学院,安徽 蚌埠 233003)
2(中南大学信息安全与大数据研究院,长沙 410083)
3(蚌埠医学院基础医学院,安徽 蚌埠 233003)
基金项目:国家自然科学基金(31570952, 81471702);北京自然科学基金(3122010)
摘    要:提高挖掘生物医学文献中的实体关联算法的性能,对开拓研究新思路有重要启示作用。提出一种改进特征的新线性内核SVM关联挖掘方法,以糖尿病相关文献摘要为研究内容,总结归纳出5种实体关联挖掘特征:实体特征、实体对特征、依赖图特征、解析树特征和名词短语约束特征,其中实体对和名词短语约束是所提出的新特征,并使用Huber损失函数作为SVM分类器的线性内核进行计算,挖掘预测疾病、基因和药物实体之间的关联。计算得到10种糖尿病相关病症和23种基因有173种关联,13种糖尿病相关病症和26种药物存在79种关联,18种基因与17种药物组成了159种关联,构建出疾病基因、疾病药物、基因药物和8种糖尿病相关疾病基因药物的关联网络,共计619种实体关联,同时预测出27种新实体关联对,最后使用ROC曲线验证3种关联(0.804、0.847和0.742)。结果表明,所提出算法与CoPub(0.710)、PubGene(0.609)、FBK-irst(0.547,0.800)和WBI(0.510,0.759)所用算法相比,最高精确度提升超过约5%(0.847与0.800),最低提升超过约20%(0.742与0.510),性能更优,为下一步在生物医学大数据中的应用打下良好基础。

关 键 词:特征  支持向量机(SVM)  关联挖掘  糖尿病  ROC曲线  
收稿时间:2017-06-27

Extraction of Entity Interactions Based on Multiple Feature Fusion Linear Kernel SVM Approach
Wei Xing,Hu Dehua,Yi Minhan,Chang Xuelian,Yang Xiaodi,Zhu Wenjie.Extraction of Entity Interactions Based on Multiple Feature Fusion Linear Kernel SVM Approach[J].Chinese Journal of Biomedical Engineering,2018,37(4):451-460.
Authors:Wei Xing  Hu Dehua  Yi Minhan  Chang Xuelian  Yang Xiaodi  Zhu Wenjie
Institution:(School of Basic Courses, Bengbu Medical College,Bengbu 233003, Anhui, China)
(Institute of Information Security and Big Data, Central South University, Changsha 410083, China)
(School of Basic Medicine, Bengbu Medical College, Bengbu 233003, Anhui, China)
Abstract:Improving the performance of interaction mining algorithm can help to explore some innovative ideas in the biomedical literature. We proposed a novel feature-based linear kernel support vector machine (SVM) approach to extract and investigate the interactions between diabetes mellitus, genes and drugs. We elaborated the five types of features (entity, entity pair, dependency graph, parse tree, noun phrase-constrained coordination) used, including two novel features, word pair and noun phrase-constrained coordination features. Then 173 interactions between 13 kinds of diabetes mellitus and 23 genes, 79 interactions between 13 kinds of diabetes mellitus and 26 drugs, 159 interactions between 18 genes and 17 genes, 619 interactions between 8 kinds of diabetes mellitus, 23 genes and 26 drugs were ontained. And 27 new entity interactions were predicted. After that we constructed the interaction network of the disease-gene, gene-drug, and disease-gene-drug. The experimental results showed that the proposed method was comparable with the algorithms used in CoPub (0.710), PubGene (0.609), FBK-irst (0.547, 0.800) and WBI (0.510, 0.759), the highest accuracy increased by about 5% (0.847 vs 0.800, and the minimum increased by over 20% (0.742 vs 0.510), which provided perspectives for applications of biomedical big data.
Keywords:features  support vector machine (SVM)  extract interactions  diabetes mellitus  ROC curve  
本文献已被 CNKI 等数据库收录!
点击此处可从《中国生物医学工程学报》浏览原始摘要信息
点击此处可从《中国生物医学工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号