首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
A small number of features are significantly correlated with classification in high-dimensional data. An ensemble feature selection method based on cluster grouping is proposed in this paper. Classification-related features are chosen using a ranking aggregation technique. These features are divided into unrelated groups by an affinity propagation clustering algorithm with a bicor correlation coefficient. Some diversity and distinguishing feature subsets are constructed by randomly selecting a feature from each group and are used to train base classifiers. Finally, some base classifiers that have better classification performance are selected using a kappa coefficient and integrated using a majority voting strategy. The experimental results based on five gene expression datasets show that the proposed method has low classification error rates, stable classification performance and strong scalability in terms of sensitivity, specificity, accuracy and G-Mean criteria.  相似文献   

2.
Contourlet-based mammography mass classification using the SVM family   总被引:1,自引:0,他引:1  
This paper is concerned with the design and development of an automatic mass classification of mammograms. The proposed method consists of three stages. In the first stage, preprocessing is performed to remove the pectoral muscles and to segment regions of interest. In the next stage contourlet transform is employed as a feature extractor to obtain the contourlet coefficients. This stage is completed by feature selection based on the genetic algorithm, resulting in a more compact and discriminative texture feature set. This improves the accuracy and robustness of the subsequent classifiers. In the final stage, classification is performed based on successive enhancement learning (SEL) weighted SVM, support vector-based fuzzy neural network (SVFNN), and kernel SVM.The proposed approach is applied to the Mammograms Image Analysis Society dataset (MIAS) and classification accuracies of 96.6%, 91.5% and 82.1% are determined over an efficient computational time by SEL weighted SVM, SVFNN and kernel SVM, respectively. Experimental results illustrate that the contourlet-based feature extraction in conjunction with the state-of-art classifiers construct a powerful, efficient and practical approach for automatic mass classification of mammograms.  相似文献   

3.
In this paper, a new approach to computer supported diagnosis of skin tumors in dermatology is presented. High resolution skin surface profiles are analyzed to recognize malignant melanomas and nevocytic nevi (moles), automatically. In the first step, several types of features are extracted by 2D image analysis methods characterizing the structure of skin surface profiles: texture features based on cooccurrence matrices, Fourier features and fractal features. Then, feature selection algorithms are applied to determine suitable feature subsets for the recognition process. Feature selection is described as an optimization problem and several approaches including heuristic strategies, greedy and genetic algorithms are compared. As quality measure for feature subsets, the classification rate of the nearest neighbor classifier computed with the leaving-one-out method is used. Genetic algorithms show the best results. Finally, neural networks with error back-propagation as learning paradigm are trained using the selected feature sets. Different network topologies, learning parameters and pruning algorithms are investigated to optimize the classification performance of the neural classifiers. With the optimized recognition system a classification performance of 97.7% is achieved.  相似文献   

4.
OBJECTIVE: Diabetic nephropathy is damage to the kidney caused by diabetes mellitus. It is a common complication and a leading cause of death in people with diabetes. However, the decline in kidney function varies considerably between patients and the determinants of diabetic nephropathy have not been clearly identified. Therefore, it is very difficult to predict the onset of diabetic nephropathy accurately with simple statistical approaches such as t-test or chi(2)-test. To accurately predict the onset of diabetic nephropathy, we applied various machine learning techniques to irregular and unbalanced diabetes dataset, such as support vector machine (SVM) classification and feature selection methods. Visualization of the risk factors was another important objective to give physicians intuitive information on each patient's clinical pattern. METHODS AND MATERIALS: We collected medical data from 292 patients with diabetes and performed preprocessing to extract 184 features from the irregular data. To predict the onset of diabetic nephropathy, we compared several classification methods such as logistic regression, SVM, and SVM with a cost sensitive learning method. We also applied several feature selection methods to remove redundant features and improve the classification performance. For risk factor analysis with SVM classifiers, we have developed a new visualization system which uses a nomogram approach. RESULTS: Linear SVM classifiers combined with wrapper or embedded feature selection methods showed the best results. Among the 184 features, the classifiers selected the same 39 features and gave 0.969 of the area under the curve by receiver operating characteristics analysis. The visualization tool was able to present the effect of each feature on the decision via graphical output. CONCLUSIONS: Our proposed method can predict the onset of diabetic nephropathy about 2-3 months before the actual diagnosis with high prediction performance from an irregular and unbalanced dataset, which statistical methods such as t-test and logistic regression could not achieve. Additionally, the visualization system provides physicians with intuitive information for risk factor analysis. Therefore, physicians can benefit from the automatic early warning of each patient and visualize risk factors, which facilitate planning of effective and proper treatment strategies.  相似文献   

5.
OBJECTIVES: The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. MATERIALS AND METHODS: Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. RESULTS: The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. CONCLUSIONS: The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after appropriate feature selection and fusion, may result in an accurate system able to assist differential diagnosis of focal liver lesions from non-enhanced CT images.  相似文献   

6.
Medical applications are often characterized by a large number of disease markers and a relatively small number of data records. We demonstrate that complete feature ranking followed by selection can lead to appreciable reductions in data dimensionality, with significant improvements in the implementation and performance of classifiers for medical diagnosis. We describe a novel approach for ranking all features according to their predictive quality using properties unique to learning algorithms based on the group method of data handling (GMDH). An abductive network training algorithm is repeatedly used to select groups of optimum predictors from the feature set at gradually increasing levels of model complexity specified by the user. Groups selected earlier are better predictors. The process is then repeated to rank features within individual groups. The resulting full feature ranking can be used to determine the optimum feature subset by starting at the top of the list and progressively including more features until the classification error rate on an out-of-sample evaluation set starts to increase due to overfitting. The approach is demonstrated on two medical diagnosis datasets (breast cancer and heart disease) and comparisons are made with other feature ranking and selection methods. Receiver operating characteristics (ROC) analysis is used to compare classifier performance. At default model complexity, dimensionality reduction of 22 and 54% could be achieved for the breast cancer and heart disease data, respectively, leading to improvements in the overall classification performance. For both datasets, considerable dimensionality reduction introduced no significant reduction in the area under the ROC curve. GMDH-based feature selection results have also proved effective with neural network classifiers.  相似文献   

7.
One important feature of the gene expression data is that the number of genes M far exceeds the number of samples N. Standard statistical methods do not work well when N < M. Development of new methodologies or modification of existing methodologies is needed for the analysis of the microarray data. In this paper, we propose a novel analysis procedure for classifying the gene expression data. This procedure involves dimension reduction using kernel principal component analysis (KPCA) and classification with logistic regression (discrimination). KPCA is a generalization and nonlinear version of principal component analysis. The proposed algorithm was applied to five different gene expression datasets involving human tumor samples. Comparison with other popular classification methods such as support vector machines and neural networks shows that our algorithm is very promising in classifying gene expression data.  相似文献   

8.
Gene selection from high-dimensional microarray gene-expression data is statistically a challenging problem. Filter approaches to gene selection have been popular because of their simplicity, efficiency, and accuracy. Due to small sample size, all samples are generally used to compute relevant ranking statistics and selection of samples in filter-based gene selection methods has not been addressed. In this paper, we extend previously-proposed simultaneous sample and gene selection approach. In a backward elimination method, a modified logistic regression loss function is used to select relevant samples at each iteration, and these samples are used to compute the T-score to rank genes. This method provides a compromise solution between T-score and other support vector machine (SVM) based algorithms. The performance is demonstrated on both simulated and real datasets with criteria such as classification performance, stability and redundancy. Results indicate that computational complexity and stability of the method are improved compared to SVM based methods without compromising the classification performance.  相似文献   

9.
A reliable method for cell phenotype image classification   总被引:2,自引:0,他引:2  
OBJECTIVE: Image-based approaches have proven to be of great utility in the automated cell phenotype classification, it is very important to develop a method that efficiently quantifies, distinguishes and classifies sub-cellular images. METHODS AND MATERIALS: In this work, the invariant locally binary patterns (LBP) are applied, for the first time, to the classification of protein sub-cellular localization images. They are tested on three image datasets (available for download), in conjunction with support vector machines (SVMs) and random subspace ensembles of neural networks. Our method based on invariant LBP provides higher accuracy than other well-known methods for feature extraction; moreover, our method does not require to (direct) crop the cells for the classification. RESULTS AND CONCLUSION: The experimental results show that the random subspace ensemble of neural networks outperforms the SVM in this problem. The proposed approach based on the solely LBP features gives accuracies of 85%, 93.9% and 88.4% on the 2D HeLa dataset, LOCATE endogenous and transfected datasets, respectively, and in combination with other state-of-the-art methods for the cell phenotype image classification we obtain a classification accuracy of 94.2%, 98.4% and 96.5%.  相似文献   

10.
A fuzzy set theoretic methodology is described that serves as a classification preprocessing strategy for supervised feed-forward neural networks. This methodology, fuzzy interquartile encoding, determines the respective degrees to which a feature belongs to a collection of fuzzy sets that overlap at the respective quartile boundaries of the feature. These membership values are subsequently used in place of the original feature. This transformation has a normalizing effect on the feature space and is more robust to feature outliers. Its effectiveness is scrutinized using several synthetic data sets with various underlying distributions. Fuzzy interquartile encoding is shown to consistently improve the discriminatory power of the underlying classifiers.The methodology is also applied to two biomedical data sets relating to tonsillectomy and/or adenoidectomy patients who may or may not have had a predisposition to excessive bleeding during their operation. The features of the first data set are blood sample test results acquired from a coagulation laboratory and the class labels are one of three hemostatic defects as identified by the reference tests. The second data set consists of patient responses to queries from a bleeding tendency questionnaire. Normal and abnormal class labels were derived from a hematology expert system designed in consultation with a pediatric hematologist. Fuzzy interquartile encoding effected an 11% improvement in the classification accuracy of the underlying neural network classifier with the former data set and 18% with the latter.  相似文献   

11.
Previously, we introduced a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances from all points to any two reference patterns in a special two-dimensional coordinate system, the relative distance plane (RDP). We extend the RDP mapping's applicability from visualization to classification. Several of the classifiers use the RDP directly. These include the standard linear discriminant analysis (LDA), nearest neighbor classifiers, and a transvariation probabilities-based classification method that is natural in the RDP. Several reference directions can also be combined to create new coordinate systems in which arbitrary classifiers can be developed. We obtain increased confidence in the classification results by cycling through all possible reference pairs and computing a misclassification-based weighted accuracy. The classification results on several high-dimensional biomedical datasets are compared.  相似文献   

12.
Biomarker identification by feature wrappers.   总被引:9,自引:0,他引:9  
M Xiong  X Fang  J Zhao 《Genome research》2001,11(11):1878-1887
Gene expression studies bridge the gap between DNA information and trait information by dissecting biochemical pathways into intermediate components between genotype and phenotype. These studies open new avenues for identifying complex disease genes and biomarkers for disease diagnosis and for assessing drug efficacy and toxicity. However, the majority of analytical methods applied to gene expression data are not efficient for biomarker identification and disease diagnosis. In this paper, we propose a general framework to incorporate feature (gene) selection into pattern recognition in the process to identify biomarkers. Using this framework, we develop three feature wrappers that search through the space of feature subsets using the classification error as measure of goodness for a particular feature subset being "wrapped around": linear discriminant analysis, logistic regression, and support vector machines. To effectively carry out this computationally intensive search process, we employ sequential forward search and sequential forward floating search algorithms. To evaluate the performance of feature selection for biomarker identification we have applied the proposed methods to three data sets. The preliminary results demonstrate that very high classification accuracy can be attained by identified composite classifiers with several biomarkers.  相似文献   

13.
Logistic regression and artificial neural networks are the models of choice in many medical data classification tasks. In this review, we summarize the differences and similarities of these models from a technical point of view, and compare them with other machine learning algorithms. We provide considerations useful for critically assessing the quality of the models and the results based on these models. Finally, we summarize our findings on how quality criteria for logistic regression and artificial neural network models are met in a sample of papers from the medical literature.  相似文献   

14.
An intelligent framework has been proposed to classify an unknown 12-Lead electrocardiogram into one of a possible number of mutually exclusive and combined diagnostic classes. The framework segregates the classification problem into a number of bi-dimensional classification problems, requiring individual bi-group classifiers for each individual diagnostic class. The bi-group classifiers were generated employing Neural Networks (NN), combined with a combination framework containing an Evidential Reasoning framework to accommodate for any conflicting situations between the bi-group classifiers. A number of different feature selection techniques were investigated with the aim of generating the most appropriate input vector for the bi-group classifiers. It was found that by reducing the original input feature vector, the generalisation ability of the classifiers, when exposed to unseen data, was enhanced and subsequently this reduced the computational requirements of the network itself. The entire framework was compared with a conventional approach to NN classification and a rule based classification approach. The framework attained a significantly higher level of classification in comparison with the other methods; 80.0% compared with 66.7% for the rule based technique and 68.00% for the conventional neural approach.  相似文献   

15.
ObjectiveIn recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance.MethodsWe introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains.ResultsUsing five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95% accuracy which outperformed other seven methods in terms of accuracy and the number of features.ConclusionsOur experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general.  相似文献   

16.
We developed and tested a new automated chromosome karyotyping scheme using a two-layer classification platform. Our hypothesis is that by selecting most effective feature sets and adaptively optimizing classifiers for the different groups of chromosomes with similar image characteristics, we can reduce the complexity of automated karyotyping scheme and improve its performance and robustness. For this purpose, we assembled an image database involving 6900 chromosomes and implemented a genetic algorithm to optimize the topology of multi-feature based artificial neural networks (ANN). In the first layer of the scheme, a single ANN was employed to classify 24 chromosomes into seven classes. In the second layer, seven ANNs were adaptively optimized for seven classes to identify individual chromosomes. The scheme was optimized and evaluated using a "training-testing-validation" method. In the first layer, the classification accuracy for the validation dataset was 92.9%. In the second layer, classification accuracy of seven ANNs ranged from 67.5% to 97.5%, in which six ANNs achieved accuracy above 93.7% and only one had lessened performance. The maximum difference of classification accuracy between the testing and validation datasets is <1.7%. The study demonstrates that this new scheme achieves higher and robust performance in classifying chromosomes.  相似文献   

17.
The purpose of this study is to evaluate transfer learning with deep convolutional neural networks for the classification of abdominal ultrasound images. Grayscale images from 185 consecutive clinical abdominal ultrasound studies were categorized into 11 categories based on the text annotation specified by the technologist for the image. Cropped images were rescaled to 256?×?256 resolution and randomized, with 4094 images from 136 studies constituting the training set, and 1423 images from 49 studies constituting the test set. The fully connected layers of two convolutional neural networks based on CaffeNet and VGGNet, previously trained on the 2012 Large Scale Visual Recognition Challenge data set, were retrained on the training set. Weights in the convolutional layers of each network were frozen to serve as fixed feature extractors. Accuracy on the test set was evaluated for each network. A radiologist experienced in abdominal ultrasound also independently classified the images in the test set into the same 11 categories. The CaffeNet network classified 77.3% of the test set images accurately (1100/1423 images), with a top-2 accuracy of 90.4% (1287/1423 images). The larger VGGNet network classified 77.9% of the test set accurately (1109/1423 images), with a top-2 accuracy of VGGNet was 89.7% (1276/1423 images). The radiologist classified 71.7% of the test set images correctly (1020/1423 images). The differences in classification accuracies between both neural networks and the radiologist were statistically significant (p?<?0.001). The results demonstrate that transfer learning with convolutional neural networks may be used to construct effective classifiers for abdominal ultrasound images.  相似文献   

18.
This paper presents an effective classification scheme consisting of the rough set theory (RST)-based feature selection and the fuzzy least squares support vector machine (LS-SVM) classifier for the surface electromyographic (sEMG)-based motion classification. The wavelet packet transform (WPT) is exploited to decompose the four-class motion EMG signals to the non-overlapped sub-bands and the energy characteristic of each sub-band is adopted to form the original feature set. In order to reduce the computation complexity, the RST is utilized to get the reduction feature set without compromising classification accuracy. In the feature reduction phase, cluster separation index (CSI) is introduced to evaluate the performance of the proposed algorithm. In the sequel, the Fuzzy LS-SVM is constructed for the multi-class classification task. The RST-based feature selection is independent of the classifier design. Consequently the classification performance will vary with different classifiers. We make the comparison between the proposed classification scheme and the commonly used classification scheme, such as the combination of the principal component analysis (PCA)-based feature selection and the neural network (NN) classifier. The results of comparative experiments show that the diverse motions can be identified with high accuracy by the proposed scheme. Compared with other feature extraction and selection algorithms and classifiers, superior performance of the proposed classification scheme illustrates the potential of the SVM techniques combined with WPT and RST in EMG motion classification.  相似文献   

19.
四种模式分类方法应用于基因表达谱分析的比较研究   总被引:1,自引:0,他引:1  
利用基因表达谱数据借助于模式分类的方法识别癌症等疾病的类型及不同亚型是DNA芯片技术的一个应用方面。在这篇文章中,我们研究比较了在不同的特征基因选择方法的情况下,Fisher线性判别,Logit非线性判别,最小距离和K-最近邻四种模式分类方法对疾病分型效能的影响及四种模式分类方法的泛化能力,同时研究了在样本构成变化的情况下,模式分类方法的稳定性。结果发现:运用t检验法和分类树选择的特征基因,明显优于随机选择的基因在四种不同的分类器中分类效果;四种分类器中,K最近邻分类器的分类效能最优;基于最小距离的分类器和K最近邻分类器有较强的泛化能力;四种模式分类对样本构成的变化呈较好的稳定性。  相似文献   

20.
Electromyography (EMG) in a bio-driven system is used as a control signal, for driving a hand prosthesis or other wearable assistive devices. Processing to get informative drive signals involves three main modules: preprocessing, dimensionality reduction, and classification. This paper proposes a system for classifying a six-channel EMG signal from 14 finger movements. A feature vector of 66 elements was determined from the six-channel EMG signal for each finger movement. Subsequently, various feature extraction techniques and classifiers were tested and evaluated. We compared the performance of six feature extraction techniques, namely principal component analysis (PCA), linear discriminant analysis (LDA), uncorrelated linear discriminant analysis (ULDA), orthogonal fuzzy neighborhood discriminant analysis (OFNDA), spectral regression linear discriminant analysis (SRLDA), and spectral regression extreme learning machine (SRELM). In addition, we also evaluated the performance of seven classifiers consisting of support vector machine (SVM), linear classifier (LC), naive Bayes (NB), k-nearest neighbors (KNN), radial basis function extreme learning machine (RBF-ELM), adaptive wavelet extreme learning machine (AW-ELM), and neural network (NN). The results showed that the combination of SRELM as the feature extraction technique and NN as the classifier yielded the best classification accuracy of 99%, which was significantly higher than those from the other combinations tested.
Graphical abstract Mean of classification accuracies for 14 finger movements obtained with various pairs of SRELM and classifier
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号