首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于SVM-RFE-SFS的基因选择方法   总被引:2,自引:0,他引:2  
基因微阵列数据通常包含大量与肿瘤分类无关的数据,会严重降低肿瘤诊断的准确率;基因微阵列数据还存在小样本、高维度的问题,也增加了肿瘤诊断的难度,所以必须对其进行基因选择。提出一种新的基于支持向量机(SVM)、联合递归特征去除(RFE)和序列前向选择(SFS)的基因选择方法。首先利用SVM计算每个基因的排序准则分数,再利用排序准则分数的一阶差分把基因划分为若干小组;对排序准则分数值最小的基因小组进行递归特征去除,消去噪声基因,同时对排序准则分数值最大的基因小组进行序列前向选择,选取有效信息基因。对白血病、结肠癌、乳腺癌基因微阵列数据的实验结果表明,所提出的方法运行效率高、分类性能好。  相似文献   

2.
Classification of gene expression data plays a significant role in prediction and diagnosis of diseases. Gene expression data has a special characteristic that there is a mismatch in gene dimension as opposed to sample dimension. All genes do not contribute for efficient classification of samples. A robust feature selection algorithm is required to identify the important genes which help in classifying the samples efficiently. In order to select informative genes (features) based on relevance and redundancy characteristics, many feature selection algorithms have been introduced in the past. Most of the earlier algorithms require computationally expensive search strategy to find an optimal feature subset. Existing feature selection methods are also sensitive to the evaluation measures. The paper introduces a novel and efficient feature selection approach based on statistically defined effective range of features for every class termed as ERGS (Effective Range based Gene Selection). The basic principle behind ERGS is that higher weight is given to the feature that discriminates the classes clearly. Experimental results on well-known gene expression datasets illustrate the effectiveness of the proposed approach. Two popular classifiers viz. Nave Bayes Classifier (NBC) and Support Vector Machine (SVM) have been used for classification. The proposed feature selection algorithm can be helpful in ranking the genes and also is capable of identifying the most relevant genes responsible for diseases like leukemia, colon tumor, lung cancer, diffuse large B-cell lymphoma (DLBCL), prostate cancer.  相似文献   

3.
心音信号可反映心脏的病理信息,是诊断心脏健康的重要依据之一。本文首先从心音信号提取时频域、梅尔倒谱系数等145个特征作为机器学习的输入数据集,然后在随机森林、LightGBM、XGBoost、GBDT、SVM共5种分类器中选出效果最佳分类器与递归特征消除算法结合进行数据挖掘,找出重要特征集并对其分类效果做比较与分析,最后运用Stacking模型融合方法优化模型。数据挖掘特征子集比同数量特征子集在准确率、召回率、精确率、F1值上分别提高了33.51%、14.54%、20.61%、24.04%;采用LightGBM和SVM模型融合可将F1值提高至92.6%。本文提出了一种有效的心音识别分类方法,挖掘出心音最重要的8个特征,为临床诊断提供参考。  相似文献   

4.
Translation of electroencephalographic (EEG) recordings into control signals for brain–computer interface (BCI) systems needs to be based on a robust classification of the various types of information. EEG-based BCI features are often noisy and likely to contain outliers. This contribution describes the application of a fuzzy support vector machine (FSVM) with a radial basis function kernel for classifying motor imagery tasks, while the statistical features over the set of the wavelet coefficients were extracted to characterize the time–frequency distribution of EEG signals. In the proposed FSVM classifier, a low fraction of support vectors was used as a criterion for choosing the kernel parameter and the trade-off parameter, together with the membership parameter based solely on training data. FSVM and support vector machine (SVM) classifiers outperformed the winner of the BCI Competition 2003 and other similar studies on the same Graz dataset, in terms of the competition criterion of the mutual information (MI), while the FSVM classifier yielded a better performance than the SVM approach. FSVM and SVM classifiers perform much better than the winner of the BCI Competition 2005 on the same Graz dataset for the subject O3 according to the competition criterion of the maximal MI steepness, while the FSVM classifier outperforms the SVM method. The proposed FSVM model has potential in reducing the effects of noise or outliers in the online classification of EEG signals in BCIs.  相似文献   

5.
A new approach based on the implementation of multiclass support vector machine (SVM) with the error correcting output codes (ECOC) is presented for classification of electroencephalogram (EEG) signals. In practical applications of pattern recognition, there are often diverse features extracted from raw data which needs recognizing. Decision making was performed in two stages: feature extraction by eigenvector methods and classification using the classifiers trained on the extracted features. The aim of the study is classification of the EEG signals by the combination of eigenvector methods and multiclass SVM. The purpose is to determine an optimum classification scheme for this problem and also to infer clues about the extracted features. The present research demonstrated that the eigenvector methods are the features which well represent the EEG signals and the multiclass SVM trained on these features achieved high classification accuracies.  相似文献   

6.
7.
四种模式分类方法应用于基因表达谱分析的比较研究   总被引:1,自引:0,他引:1  
利用基因表达谱数据借助于模式分类的方法识别癌症等疾病的类型及不同亚型是DNA芯片技术的一个应用方面。在这篇文章中,我们研究比较了在不同的特征基因选择方法的情况下,Fisher线性判别,Logit非线性判别,最小距离和K-最近邻四种模式分类方法对疾病分型效能的影响及四种模式分类方法的泛化能力,同时研究了在样本构成变化的情况下,模式分类方法的稳定性。结果发现:运用t检验法和分类树选择的特征基因,明显优于随机选择的基因在四种不同的分类器中分类效果;四种分类器中,K最近邻分类器的分类效能最优;基于最小距离的分类器和K最近邻分类器有较强的泛化能力;四种模式分类对样本构成的变化呈较好的稳定性。  相似文献   

8.
Brain tumor classification based on long echo proton MRS signals   总被引:5,自引:0,他引:5  
There has been a growing research interest in brain tumor classification based on proton magnetic resonance spectroscopy (1H MRS) signals. Four research centers within the EU funded INTERPRET project have acquired a significant number of long echo 1H MRS signals for brain tumor classification. In this paper, we present an objective comparison of several classification techniques applied to the discrimination of four types of brain tumors: meningiomas, glioblastomas, astrocytomas grade II and metastases. Linear and non-linear classifiers are compared: linear discriminant analysis (LDA), support vector machines (SVM) and least squares SVM (LS-SVM) with a linear kernel as linear techniques and LS-SVM with a radial basis function (RBF) kernel as a non-linear technique. Kernel-based methods can perform well in processing high dimensional data. This motivates the inclusion of SVM and LS-SVM in this study. The analysis includes optimal input variable selection, (hyper-) parameter estimation, followed by performance evaluation. The classification performance is evaluated over 200 stratified random samplings of the dataset into training and test sets. Receiver operating characteristic (ROC) curve analysis measures the performance of binary classification, while for multiclass classification, we consider the accuracy as performance measure. Based on the complete magnitude spectra, automated binary classifiers are able to reach an area under the ROC curve (AUC) of more than 0.9 except for the hard case glioblastomas versus metastases. Although, based on the available long echo 1H MRS data, we did not find any statistically significant difference between the performances of LDA and the kernel-based methods, the latter have the strength that no dimensionality reduction is required to obtain such a high performance.  相似文献   

9.
OBJECTIVE: This study investigates the use of automated pattern recognition methods on magnetic resonance data with the ultimate goal to assist clinicians in the diagnosis of brain tumours. Recently, the combined use of magnetic resonance imaging (MRI) and magnetic resonance spectroscopic imaging (MRSI) has demonstrated to improve the accuracy of classifiers. In this paper we extend previous work that only uses binary classifiers to assess the type and grade of a tumour to a multiclass classification system obtaining class probabilities. The important problem of input feature selection is also addressed. METHODS AND MATERIAL: Least squares support vector machines (LS-SVMs) with radial basis function kernel are applied and compared with linear discriminant analysis (LDA). Both a Bayesian framework and cross-validation are used to infer the parameters of the LS-SVM classifiers. Four different techniques to obtain multiclass probabilities as a measure of accuracy are compared. Four variable selection methods are explored. MRI and MRSI data are selected from the INTERPRET project database. RESULTS: The results illustrate the significantly better performance of automatic relevance determination (ARD), in combination with LS-SVMs in a Bayesian framework and coupling of class probabilities, compared to classical LDA. CONCLUSION: It is demonstrated that binary LS-SVMs can be extended to a multiclass classifier system obtaining class probabilities by Bayesian techniques and pairwise coupling. Feature selection based on ARD further improves the results. This classifier system can be of great help in the diagnosis of brain tumours.  相似文献   

10.
Gene selection is an important issue in analyzing multiclass microarray data. Among many proposed selection methods, the traditional ANOVA F test statistic has been employed to identify informative genes for both class prediction (classification) and discovery problems. However, the F test statistic assumes an equal variance. This assumption may not be realistic for gene expression data. This paper explores other alternative test statistics which can handle heterogeneity of the variances. We study five such test statistics, which include Brown-Forsythe test statistic and Welch test statistic. Their performance is evaluated and compared with that of F statistic over different classification methods applied to publicly available microarray datasets.  相似文献   

11.
ABSTRACT: BACKGROUND: In Traditional Chinese Medicine (TCM), the lip diagnosis is an important diagnostic method which has a long history and is applied widely. The lip color of a person is considered as a symptom to reflect the physical conditions of organs in the body. However, the traditional diagnostic approach is mainly based on observation by doctor's nude eyes, which is non-quantitative and subjective. The non-quantitative approach largely depends on the doctor's experience and influences accurate the diagnosis and treatment in TCM. Developing new quantification methods to identify the exact syndrome based on the lip diagnosis of TCM becomes urgent and important. In this paper, we design a computer-assisted classification model to provide an automatic and quantitative approach for the diagnosis of TCM based on the lip images. METHODS: A computer-assisted classification method is designed and applied for syndrome diagnosis based on the lip images. Our purpose is to classify the lip images into four groups: deep-red, red, purple and pale. The proposed scheme consists of four steps including the lip image preprocessing, image feature extraction, feature selection and classification. The extracted 84 features contain the lip color space component, texture and moment features. Feature subset selection is performed by using SVM-RFE (Support Vector Machine with recursive feature elimination), mRMR (minimum Redundancy Maximum Relevance) and IG (information gain). Classification model is constructed based on the collected lip image features using multi-class SVM and Weighted multi-class SVM (WSVM). In addition, we compare SVM with k-nearest neighbor (kNN) algorithm, Multiple Asymmetric Partial Least Squares Classifier (MAPLSC) and Naive Bayes for the diagnosis performance comparison. All displayed faces image have obtained consent from the participants. RESULTS: A total of 257 lip images are collected for the modeling of lip diagnosis in TCM. The feature selection method SVM-RFE selects 9 important features which are composed of 5 color component features, 3 texture features and 1 moment feature. SVM, MAPLSC, Naive Bayes, kNN showed better classification results based on the 9 selected features than the results obtained from all the 84 features. The total classification accuracy of the five methods is 84%, 81%, 79% and 81%, 77%, respectively. So SVM achieves the best classification accuracy. The classification accuracy of SVM is 81%, 71%, 89% and 86% on Deep-red, Pale Purple, Red and lip image models, respectively. While with the feature selection algorithm mRMR and IG, the total classification accuracy of WSVM achieves the best classification accuracy. Therefore, the results show that the system can achieve best classification accuracy combined with SVM classifiers and SVM-REF feature selection algorithm. CONCLUSIONS: A diagnostic system is proposed, which firstly segments the lip from the original facial image based on the Chan-Vese level set model and Otsu method, then extracts three kinds of features (color space features, Haralick co-occurrence features and Zernike moment features) on the lip image. Meanwhile, SVM-REF is adopted to select the optimal features. Finally, SVM is applied to classify the four classes. Besides, we also compare different feature selection algorithms and classifiers to verify our system. So the developed automatic and quantitative diagnosis system of TCM is effective to distinguish four lip image classes: Deep-red, Purple, Red and Pale. This study puts forward a new method and idea for the quantitative examination on lip diagnosis of TCM, as well as provides a template for objective diagnosis in TCM.  相似文献   

12.
Classifiers have been widely used to select an optimal subset of feature genes from microarray data for accurate classification of cancer samples and cancer-related studies. However, the classification rules derived from most classifiers are complex and difficult to understand in biological significance. How to solve this problem is a new challenge. In this paper, a new classification model based on gene pair is proposed to address the problem. The experimental results on several microarray data demonstrate that the proposed classification model performs well in finding a large number of excellent feature gene pairs. A 100% LOOCV classification accuracy can be achieved using a single classification model based on optimal feature gene pair or combining multiple top-ranked classification models. Using the proposed method, we successfully identified important cancer-related genes that had been validated in previous biological studies while they were not discovered by the other methods.  相似文献   

13.
A new spike sorting method based on the support vector machine (SVM) is proposed to resolve the superposition problem. The spike superposition is generally resolved by the template matching. Previous template matching methods separate the spikes through linear classifiers. The classification performance is severely influenced by the background noise included in spike trains. The nonlinear classifiers with high generation ability are required to deal with the task. A multi-class SVM classifier is therefore applied to separate the spikes, which contains several binary SVM classifiers. Every binary SVM classifier corresponding to one spike class is used to identify the single and superposition spikes. The superposition spikes are decomposed through template extraction. The experimental results on the simulated and real data demonstrate the utility of the proposed method.  相似文献   

14.
Contourlet-based mammography mass classification using the SVM family   总被引:1,自引:0,他引:1  
This paper is concerned with the design and development of an automatic mass classification of mammograms. The proposed method consists of three stages. In the first stage, preprocessing is performed to remove the pectoral muscles and to segment regions of interest. In the next stage contourlet transform is employed as a feature extractor to obtain the contourlet coefficients. This stage is completed by feature selection based on the genetic algorithm, resulting in a more compact and discriminative texture feature set. This improves the accuracy and robustness of the subsequent classifiers. In the final stage, classification is performed based on successive enhancement learning (SEL) weighted SVM, support vector-based fuzzy neural network (SVFNN), and kernel SVM.The proposed approach is applied to the Mammograms Image Analysis Society dataset (MIAS) and classification accuracies of 96.6%, 91.5% and 82.1% are determined over an efficient computational time by SEL weighted SVM, SVFNN and kernel SVM, respectively. Experimental results illustrate that the contourlet-based feature extraction in conjunction with the state-of-art classifiers construct a powerful, efficient and practical approach for automatic mass classification of mammograms.  相似文献   

15.
We report the application of a support vector machine (SVM) for the development of diagnostic algorithms for optical diagnosis of cancer. Both linear and nonlinear SVMs have been investigated for this purpose. We develop a methodology that makes use of SVM for both feature extraction and classification jointly by integrating the newly developed recursive feature elimination (RFE) in the framework of SVM. This leads to significantly improved classification results compared to those obtained when an independent feature extractor such as principal component analysis (PCA) is used. The integrated SVM-RFE approach is also found to outperform the classification results yielded by traditional Fisher's linear discriminant (FLD)-based algorithms. All the algorithms are developed using spectral data acquired in a clinical in vivo laser-induced fluorescence (LIF) spectroscopic study conducted on patients being screened for cancer of the oral cavity and normal volunteers. The best sensitivity and specificity values provided by the nonlinear SVM-RFE algorithm over the data sets investigated are 95 and 96% toward cancer for the training set data based on leave-one-out cross validation and 93 and 97% toward cancer for the independent validation set data. When tested on the spectral data of the uninvolved oral cavity sites from the patients it yielded a specificity of 85%.  相似文献   

16.
Gene selection is important for cancer classification based on gene expression data, because of high dimensionality and small sample size. In this paper, we present a new gene selection method based on clustering, in which dissimilarity measures are obtained through kernel functions. It searches for best weights of genes iteratively at the same time to optimize the clustering objective function. Adaptive distance is used in the process, which is suitable to learn the weights of genes during the clustering process, improving the performance of the algorithm. The proposed algorithm is simple and does not require any modification or parameter optimization for each dataset. We tested it on eight publicly available datasets, using two classifiers (support vector machine, k-nearest neighbor), compared with other six competitive feature selectors. The results show that the proposed algorithm is capable of achieving better accuracies and may be an efficient tool for finding possible biomarkers from gene expression data.  相似文献   

17.
决策树特征基因选择方法对SVM有效性的研究   总被引:6,自引:4,他引:6  
基因芯片新兴生物技术为从分子水平上研究疾病的发病机理和临床疾病诊断提供了强有力的手段。其中特征基因选择是疾病模式识别诊断最重要的一个环节 ,但不同的特征基因选择方法往往影响疾病模式分类方法的效能。本研究针对这一问题 ,结合结肠癌基因表达谱数据分析 ,研究了递归决策树特征基因选择集成方法EFST ,对支持向量机 (SVM )模式分类器能力的影响。主要从特征基因选择前后分类器的性能、支持向量的吻合度、错分样本标识的吻合度、对样本均匀翻倍模式分类器的稳定性的影响等四个方面研究EFST特征选择算法对支持向量机模式分类方法的影响 ,同时考察了支持向量机模式分类器的泛化能力。结果表明 :基于决策树特征基因选择算法EFST明显地提高了支持向量机模式分类的效能 ,且支持向量机模式分类器具有很强的泛化能力。  相似文献   

18.
OBJECTIVE: Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. MATERIAL AND METHODS: A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. RESULTS AND CONCLUSION: Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.  相似文献   

19.
心音是诊断心血管疾病常用的医学信号之一。本文对心音正常/异常的二分类问题进行了研究,提出了一种基于极限梯度提升(XGBoost)和深度神经网络共同决策的心音分类算法,实现了对特征的选择和模型准确率的进一步提升。首先,本文对预处理后的心音信号进行心音分割,在此基础上提取了5个大类的特征,前4类特征采用递归特征消除法进行特征选择,作为XGBoost分类器的输入,最后一类为梅尔频率倒谱系数(MFCC),作为长短时记忆网络(LSTM)的输入。考虑到数据集的不平衡性,本文在两种分类器中皆使用了加权改进的方法。最后采用异质集成决策方法得到预测结果。将本文所提心音分类算法应用于PhysioNet网站在2016年发起的PhysioNet心脏病学挑战赛(CINC)所用公开心音数据库,以测试灵敏度、特异性、修正后的准确率以及F得分,结果分别为93%、89.4%、91.2%、91.3%,通过与其他研究者应用机器学习、卷积神经网络(CNN)等方法的结果比较,在准确率和灵敏度上有明显提高,证明了本文方法能有效地提高心音信号分类的准确性,在部分心血管疾病的临床辅助诊断应用中有很大的潜力。  相似文献   

20.
A small number of features are significantly correlated with classification in high-dimensional data. An ensemble feature selection method based on cluster grouping is proposed in this paper. Classification-related features are chosen using a ranking aggregation technique. These features are divided into unrelated groups by an affinity propagation clustering algorithm with a bicor correlation coefficient. Some diversity and distinguishing feature subsets are constructed by randomly selecting a feature from each group and are used to train base classifiers. Finally, some base classifiers that have better classification performance are selected using a kappa coefficient and integrated using a majority voting strategy. The experimental results based on five gene expression datasets show that the proposed method has low classification error rates, stable classification performance and strong scalability in terms of sensitivity, specificity, accuracy and G-Mean criteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号