首页 | 本学科首页   官方微博 | 高级检索  
检索        


Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data
Institution:1. Department of Statistics, University of Leeds, Leeds, United Kingdom;2. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Abstract:Our main interest in supervised classification of gene expression data is to infer whether the expressions can discriminate biological characteristics of samples. With thousands of gene expressions to consider, a gene selection has been advocated to decrease classification by including only the discriminating genes. We propose to make the gene selection based on partial least squares and logistic regression random-effects (RE) estimates before the selected genes are evaluated in classification models. We compare the selection with that based on the two-sample t-statistics, a current practice, and modified t-statistics. The results indicate that gene selection based on logistic regression RE estimates is recommended in a general situation, while the selection based on the PLS estimates is recommended when the number of samples is low. Gene selection based on the modified t-statistics performs well when the genes exhibit moderate-to-high variability with moderate group separation. Respecting the characteristics of the data is a key aspect to consider in gene selection.
Keywords:Supervised classification  Gene selection  Filtering  Partial least squares  Logistic regression  Random effects
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号