首页 | 本学科首页   官方微博 | 高级检索  
检索        

三种统计分析方法在基因表达谱数据中的比较研究
引用本文:史晓雯,肖纯,刘芸良,刘艳.三种统计分析方法在基因表达谱数据中的比较研究[J].实用预防医学,2018,25(2):155-159.
作者姓名:史晓雯  肖纯  刘芸良  刘艳
作者单位:哈尔滨医科大学卫生统计学教研室,黑龙江 哈尔滨 150081
基金项目:国家自然科学基金(81172741;30972537)
摘    要:目的比较SCAD-支持向量机、支持向量机和弹性网三种方法对基因表达谱数据的变量筛选和预测判别能力。方法根据设置的参数生成不同条件的基因表达谱模拟数据和实际数据,利用FDR、一致性错误率和ROC曲线下面积(AUC值)从三个方面评价三种方法的变量筛选和预测判别能力。结果模拟实验显示在差异变量数不变的情况下,随着差异变量间相关系数的增加,三种方法建立模型的变量筛选和预测判别能力均提高;当差异变量间相关系数不变时,随着差异变量数目的增加,SCAD-支持向量机和弹性网方法的变量筛选和预测判别能力均呈下降趋势,而支持向量机呈现提高趋势。结论 SCAD-支持向量机不仅改善了支持向量机不能直接进行变量筛选的不足同时提高了模型的精度以及判别的准确性。综合来看SCAD-支持向量机的变量筛选和预测判别能力更优,处理变量间有高度相关性的基因表达谱数据时可以获得更高的预测精度和更稳定的模型估计。

关 键 词:SCAD-支持向量机  弹性网  一致性错误率  ROC曲线下面积
收稿时间:2017-03-22

Comparison of three statistical methods based on gene expression profile data
SHI Xiao-wen,XIAO Chun,LIU Yun-liang,LIU Yan.Comparison of three statistical methods based on gene expression profile data[J].Practical Preventive Medicine,2018,25(2):155-159.
Authors:SHI Xiao-wen  XIAO Chun  LIU Yun-liang  LIU Yan
Institution:Department of Medical Statistics, Harbin Medical Univeristy, Harbin, Heilongjiang 150081, China
Abstract:Objective To compare the variable selection and predictive ability of gene expression profile data among the three methods, including smoothly clipped absolute deviation-support vector machine (SCAD-SVM), support vector machine (SVM) and Elastic Net. Methods Different conditions of gene expression profile simulation data and the actual data of colon cancer were generated according to the set of parameters. The false discovery rate (FDR), the consistency error rate and the area under the ROC curve (AUC) were used to evaluate the variable selection and predictive ability of the above-mentioned three methods. Results The simulation test showed that the variable selection and predictive ability of the models established by the three methods were improved when the number of differential variables was fixed and the correlation coefficient between differential variables increased. When the correlation coefficient between differential variables was constant and the number of differential variables increased, the variable selection and predictive ability of SCAD-SVM and Elastic Net showed a downward tendency, whereas those of SVM showed an upward tendency. Conclusions SCAD-SVM not only improves the deficiency of SVM, which can not make variable selection directly, but also simultaneously promotes the precision and prediction accuracy of the model established. On the whole, SCAD-SVM is superior in the variable selection and predictive ability; moreover, it can get higher prediction precision and more stable model estimate when manipulating the high correlation data between variables of gene expression profile data.
Keywords:SCAD-SVM  Elastic Net  consistency error rate  the area under the ROC curve  
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《实用预防医学》浏览原始摘要信息
点击此处可从《实用预防医学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号