基因表达数据中加权SAM法的基因选择和分类预测研究 Gene selection and classification prediction of weighted SAM method in gene expression data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基因表达数据中加权SAM法的基因选择和分类预测研究

引用本文：	任雨冬,陆震,李婧惟,刘艳.基因表达数据中加权SAM法的基因选择和分类预测研究[J].实用预防医学,2020,27(12):1537-1539.

作者姓名：	任雨冬陆震李婧惟刘艳

作者单位：	哈尔滨医科大学卫生统计学教研室,黑龙江哈尔滨 150081

基金项目：	黑龙江省自然科学基金( LH2019H005)

摘要：	目的使用高斯核函数和欧式距离函数改进微阵列显著分析法(significance analysis of microarray,SAM)得到MSAM1法(modified significance analysis of microarray-1,MSAM1 )和MSAM2法(modified significance analysis ofmicroarray-2,MSAM2 ),与SAM法、Relief法、支持向量机递归特征消除法(support vector machine recursive featureelimination, SVM-RFE )进行对比,评价在基因表达数据中MSAM1法、MSAM2法的基因选择和分类预测能力。方法从Bioconductor中的golubEsets包获得leukemia数据集(Golub等人给出了该数据集所包含的50个差异基因),运用R软件实现5种算法,分别用正确率和ROC曲线下面积即AUC值评价基因选择能力和分类预测能力,用Kruskal-Wallis H检验比较5种方法的正确率和AUC值的组间差异,进一步的两两比较采用SNK-q检验。结果正确率和AUC值均表现为MSAM1和MSAM2最优,SAM和SVM-RFE法次之,Relief法排在最后;5种方法的组间差异有统计学意义(H=150.333,P<0.0001和H=293.2579,P<0.0001),两两比较结果显示虽然MSAM1和MSAM2之间差异无统计学意义(P>0.05),但两种方法与其他3种方法之间差异均有统计学意义(P<0.05)。结论用高斯核函数和欧式距离函数改进的加权SAM法提高了SAM法的基因选择和分类预测能力,在实际基因表达数据的应用中可以得到更为稳定的分析结果。
关键词：	SAM 基因表达数据基因选择分类预测
收稿时间：	2020-02-03
Gene selection and classification prediction of weighted SAM method in gene expression data

REN Yu-dong,LU Zhen,LI Jing-wei,LIU Yan.Gene selection and classification prediction of weighted SAM method in gene expression data[J].Practical Preventive Medicine,2020,27(12):1537-1539.

Authors:	REN Yu-dong LU Zhen LI Jing-wei LIU Yan

Institution:	Department of Health Statistics, Harbin Medical University, Harbin, Heilongjiang 150081, China

Abstract:	Objective The modified significance analysis of microarray-1 (MSAM1) method and the modified significance analysis of microarray-2 (MSAM2) method are obtained by using the Gaussian kernel function and the Euclidean distance function to improve the significance analysis of microarray (SAM) method, respectively. The original SAM method, the support vector machine recursive feature elimination (SVM-RFE) method, and the Relief method were compared to evaluate the gene selection and classification prediction ability of the MSAM1 method and the MSAM2 method in gene expression data. Methods The leukemia data set was obtained from the golubEsets package in Bioconductor (Golub, et al. gave 50 differential genes contained in the data set). Five kinds of gene selection methods were implemented using R software. The gene selection ability and classification prediction capability were evaluated by the correct rate and the area under the ROC curve, namely, the AUC value. Kruskal-Wallis H test was used to compare thebetween-groupdifferences in the correct rate and AUC valueamong the five methods,andSNK-q testwas used for further pairwise comparison. Results Both the correct rate and the AUC value were optimal for MSAM1 and MSAM2, followed by the SAM and SVM-RFE methods, and the Relief method was ranked last.The between-group differencesamong the five methodswere statistically significant (H=150.333, P<0.0001; H=293.2579, P<0.0001). The results of the pairwise comparison showed that there was no statistically significant difference between MSAM1 and MSAM2 (P>0.05), but the differences between the above-mentioned two methods and the other three methods were statistically significant (P<0.05). Conclusions The weighted SAM method modified by Gaussian kernel function and Euclidean distance function improves the gene selection and classification prediction ability of SAM method, and can obtain more stable analysis results in the application of actual gene expression data.

Keywords:	significance analysis of microarray gene expression data gene selection classification prediction

	点击此处可从《实用预防医学》浏览原始摘要信息
	点击此处可从《实用预防医学》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏