首页 | 本学科首页   官方微博 | 高级检索  
检索        

基因表达数据中加权SAM法的基因选择和分类预测研究
引用本文:任雨冬,陆震,李婧惟,刘艳.基因表达数据中加权SAM法的基因选择和分类预测研究[J].实用预防医学,2020,27(12):1537-1539.
作者姓名:任雨冬  陆震  李婧惟  刘艳
作者单位:哈尔滨医科大学卫生统计学教研室,黑龙江 哈尔滨 150081
基金项目:黑龙江省自然科学基金( LH2019H005)
摘    要:目的 使用高斯核函数和欧式距离函数改进微阵列显著分析法(significance analysis of microarray,SAM)得到MSAM1法(modified significance analysis of microarray-1,MSAM1 )和MSAM2法(modified significance analysis ofmicroarray-2,MSAM2 ),与SAM法、Relief法、支持向量机递归特征消除法(support vector machine recursive featureelimination, SVM-RFE )进行对比,评价在基因表达数据中MSAM1法、MSAM2法的基因选择和分类预测能力。 方法 从Bioconductor中的golubEsets包获得leukemia数据集(Golub等人给出了该数据集所包含的50个差异基因),运用R软件实现5种算法,分别用正确率和ROC曲线下面积即AUC值评价基因选择能力和分类预测能力,用Kruskal-Wallis H检验比较5种方法的正确率和AUC值的组间差异,进一步的两两比较采用SNK-q检验。 结果 正确率和AUC值均表现为MSAM1和MSAM2最优,SAM和SVM-RFE法次之,Relief法排在最后;5种方法的组间差异有统计学意义(H=150.333,P<0.0001和H=293.2579,P<0.0001),两两比较结果显示虽然MSAM1和MSAM2之间差异无统计学意义(P>0.05),但两种方法与其他3种方法之间差异均有统计学意义(P<0.05)。 结论 用高斯核函数和欧式距离函数改进的加权SAM法提高了SAM法的基因选择和分类预测能力,在实际基因表达数据的应用中可以得到更为稳定的分析结果。

关 键 词:SAM  基因表达数据  基因选择  分类预测  
收稿时间:2020-02-03

Gene selection and classification prediction of weighted SAM method in gene expression data
REN Yu-dong,LU Zhen,LI Jing-wei,LIU Yan.Gene selection and classification prediction of weighted SAM method in gene expression data[J].Practical Preventive Medicine,2020,27(12):1537-1539.
Authors:REN Yu-dong  LU Zhen  LI Jing-wei  LIU Yan
Institution:Department of Health Statistics, Harbin Medical University, Harbin, Heilongjiang 150081, China
Abstract:Objective The modified significance analysis of microarray-1 (MSAM1) method and the modified significance analysis of microarray-2 (MSAM2) method are obtained by using the Gaussian kernel function and the Euclidean distance function to improve the significance analysis of microarray (SAM) method, respectively. The original SAM method, the support vector machine recursive feature elimination (SVM-RFE) method, and the Relief method were compared to evaluate the gene selection and classification prediction ability of the MSAM1 method and the MSAM2 method in gene expression data. Methods The leukemia data set was obtained from the golubEsets package in Bioconductor (Golub, et al. gave 50 differential genes contained in the data set). Five kinds of gene selection methods were implemented using R software. The gene selection ability and classification prediction capability were evaluated by the correct rate and the area under the ROC curve, namely, the AUC value. Kruskal-Wallis H test was used to compare thebetween-groupdifferences in the correct rate and AUC valueamong the five methods,andSNK-q testwas used for further pairwise comparison. Results Both the correct rate and the AUC value were optimal for MSAM1 and MSAM2, followed by the SAM and SVM-RFE methods, and the Relief method was ranked last.The between-group differencesamong the five methodswere statistically significant (H=150.333, P<0.0001; H=293.2579, P<0.0001). The results of the pairwise comparison showed that there was no statistically significant difference between MSAM1 and MSAM2 (P>0.05), but the differences between the above-mentioned two methods and the other three methods were statistically significant (P<0.05). Conclusions The weighted SAM method modified by Gaussian kernel function and Euclidean distance function improves the gene selection and classification prediction ability of SAM method, and can obtain more stable analysis results in the application of actual gene expression data.
Keywords:significance analysis of microarray  gene expression data  gene selection  classification prediction  
点击此处可从《实用预防医学》浏览原始摘要信息
点击此处可从《实用预防医学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号