Gene and sample selection using T-score with sample selection |
| |
Institution: | 1. Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore;2. Singapore-MIT Alliance, Singapore;3. Department of Biological Engineering, Massachusetts Institute of Technology, USA |
| |
Abstract: | Gene selection from high-dimensional microarray gene-expression data is statistically a challenging problem. Filter approaches to gene selection have been popular because of their simplicity, efficiency, and accuracy. Due to small sample size, all samples are generally used to compute relevant ranking statistics and selection of samples in filter-based gene selection methods has not been addressed. In this paper, we extend previously-proposed simultaneous sample and gene selection approach. In a backward elimination method, a modified logistic regression loss function is used to select relevant samples at each iteration, and these samples are used to compute the T-score to rank genes. This method provides a compromise solution between T-score and other support vector machine (SVM) based algorithms. The performance is demonstrated on both simulated and real datasets with criteria such as classification performance, stability and redundancy. Results indicate that computational complexity and stability of the method are improved compared to SVM based methods without compromising the classification performance. |
| |
Keywords: | Feature selection Gene expression Logistic regression SVM-RFE Approximate support vectors |
本文献已被 ScienceDirect 等数据库收录! |
|