首页 | 本学科首页   官方微博 | 高级检索  
检索        

两种基于偏最小二乘法的分类模型对肿瘤基因表达数据行多分类的比较研究
引用本文:金志超,陆健,吴骋,高青斌,孙亚林,贺佳.两种基于偏最小二乘法的分类模型对肿瘤基因表达数据行多分类的比较研究[J].中国卫生统计,2009,29(5).
作者姓名:金志超  陆健  吴骋  高青斌  孙亚林  贺佳
作者单位:第二军医大学卫生统计学教研室,200433
基金项目:国家自然科学基金资助,国家自然科学基金资助,国家自然科学基金资助,上海市基础研究重点项目 
摘    要:目的 比较两种基于偏最小二乘法的分类模型对肿瘤基因表达数据行多分类分析的效果,比较不同差异基因选取方法对分类结果的影响.方法 利用NCI60等4个肿瘤基因表达数据库,通过4种不同方法选取差异表达基因,在此基础上,用两种基于偏最小二乘的方法行多分类分析.一是偏最小二乘线性判别,首先运用偏最小二乘法行降维,再利用降维得到的成分作为输入变量作线性判别分析;二是偏最小二乘判别分析,利用偏最小二乘回归直接进行分类.分类效果采用留一法和10倍交叉验证法进行评价.结果 偏最小二乘判别分析的分类效果略优于偏最小二乘降维后的线性判别.以变量重要性指标选取差异表达基因时分类效果较好,其次是SAM法.结论 在对肿瘤基因表达数据行多分类分析时,偏最小二乘法既是一种高效的降维方法,也是一种实用的分类方法.

关 键 词:肿瘤基因表达数据  偏最小二乘法  多分类

Two Multiple Classification Methods Based on Partial Least Squares Using Tumor Microarray Gene Expression Data on a Comparative Study
Abstract:Objective To compare two multiple classification methods based on partial least squares (PLS) using tumor microarray gene expression data and to compare the influence of four different significant gene selection methods to the classification models. Methods Using four real tumor microarray gene expression data,four different significant gene selection methods were performed, and then two classification methods were carried out. One is first using PLS to reduce the dimension and then using linear discrimination analysis to perform classification (PLA-LDA). Another is using PLS regression to perform classification directly (PLS-DA). Leave one out cross validation and 10-fold cross validation were used to e-valuate the classification model. Results PLS-DA outperformed PLS-LDA on four microarray gene expression data. Conclusion PLS is a powerful and versatile tool to perform dimension reduction and multiple classification.
Keywords:Tumor microarray gene expression data  Par-tial least squares method  Multiple classification
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号