首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 218 毫秒
1.
Cai X  Wei J  Wen G  Li J 《生物医学工程学杂志》2011,28(6):1213-1216
针对基因表达谱样本数据少、维度高、噪声大的特点,维数约减十分必要。由于基因表达谱数据是以一种高维非线性的向量存在,传统的降维方法使得一些本质维数较低的高维数据无法投影到低维空间中,为此本文引入一种改进距离的局部线性嵌入(LLE)算法对其进行降维。由于原始的LLE方法对近邻个数参数非常敏感,为了增强算法对近邻参数的鲁棒性,文中提出了一种改进距离来度量样本点之间的距离,从而降低了样本点分布不均匀对算法的影响。实验结果表明,改进距离的LLE方法能够有效地提取分类特征信息,并能够在保持较高的分类正确率的前提下大幅度地降低基因数据的维数。  相似文献   

2.
癌症基因表达数据具有高维、小样本的特点,对其进行维数约减十分有必要。传统的线性降维方法不能发现数据点之间的非线性关系,降维效果不好,因此,本文引入一种改进距离的多组权局部线性嵌入(DMLLE)算法对其进行降维。该算法采用一种改进距离来计算每个数据点的近邻点,为每一个近邻引入多组线性无关的局部权向量进行线性重构,通过最小化重构误差得到高维数据在低维空间的嵌入结果。实验结果表明,DMLLE算法对癌症基因表达数据有很好的降维效果。  相似文献   

3.
运用近邻传播聚类分析进行SELDI-TOF蛋白质谱特征选择   总被引:3,自引:0,他引:3  
针对如何有效分析高通量SELDI-TOF质谱数据以及筛选与肿瘤相关的蛋白质位点,提出一种基于近邻传播聚类分析的特征选择方法.首先利用t-test对SELDI数据进行初筛,然后利用近邻传播聚类分析以及零空间LDA对数据进行降维和去相关处理,最后采用SVM-RFE进行特征选择,筛选出与肿瘤判别相关的蛋白质位点.利用SVM、KNN、NB及J4.8等4个分类器,估算算法的分类性能.结果表明,在卵巢癌公共数据集OC-WCX2a和OC-WCX2b以及浙江省肿瘤医院乳腺癌数据集BC-WCX2a上显示该算法,在上述3个数据集中分类率分别达到96.43%、99.66%、90.88%,敏感性分别达到97.00%、100%、96.17%,特异性分别达到95.85%、99.08%、81.92%,并分别挑选出与肿瘤判别相关的10个蛋白位点.所提出的算法能够获得较好的分类率,有效提取出具有较好判别效果的蛋白质谱位点,有助于癌症的辅助诊断.  相似文献   

4.
目的为提高假肢系统对动作信号的识别速度,设计了基于优化蚁群算法(ant colonyoptimization,ACO)的特征选择法,对表面肌电信号(surface electromyography,sEMG)高维特征向量降维以减少计算负担。方法以特征与目标类型之间互信息关系作为启发函数,通过蚁群算法选出最佳特征子集,最后用已训练好的人工神经网络检验其分类性能。结果对10名健康受试者进行了手腕部动作的肌电信号模式分类实验。与传统主成分分析法(principle component analysis,PCA)相比,该算法选出的特征子集提高了识别准确率,并显著降低了原始特征集的特征维数,进而简化分类器的结构,减少计算开销。结论本方法在实时性要求高的肌电控制假肢等系统中具有良好的应用前景。  相似文献   

5.
脑机接口(BCI)可以直接通过脑电(EEG)信号控制外部设备。本文针对传统主成分分析(PCA)和二维主成分分析(2DPCA)处理多通道EEG信号的局限性,提出了多线性主成分分析(MPCA)的张量特征提取和分类框架。首先生成张量EEG数据,然后进行张量降维并提取特征,最后用Fisher线性判别分析分类器进行分类。实验中将新方法应用到BCI competitionⅡ数据集4和BCI competitionⅣ数据集3,分别使用了EEG数据的时空二阶张量表示形式和时空频三阶张量表示形式,通过对可调参数多次调试,取得了高于其它同类降维方法的最佳结果。二阶输入最高正确率分别达到81.0%和40.1%,三阶输入分别达到76.0%和43.5%。  相似文献   

6.
提出基于独立成分分析(ICA)和随机森林判别的Microarray分析方法。该方法先采用独立成分分析获取高阶统计信息,提取Microarray数据特征,达到降维的目的。再应用提取的特征,采用随机森林判别法对样本进行分类。数值分析结果表明,提取5个特征就可以使袋外样本OOB(out of bag)的分类错误率达到7.89%。该方法有效地降低了特征空间维数,具有较高的正确识别率,提高了算法的鲁棒性和灵活性。  相似文献   

7.
针对心脏疾病发病率高且不易自主检测的问题,提出了一种心电信号特征提取和分类诊断算法。首先对心电信号进行提升小波变换和改进半软阈值相结合的预处理变换,在去除心电信号的噪声后,利用主成分分析(principal component analysis,PCA)对心电信号进行降维,并利用核独立成分提取心电信号的非线性特征;同时离散小波变换提取去噪后心电信号的频域特征,基于线性判别分析(linear discriminant analysis, LDA)对频域统计特征进行降维处理。将两种不同的特征向量组成多域特征空间,最后利用支持向量机对多域特征空间分类,遗传算法对其参数进行寻优,从而实现心电信号特征的分类。实验结果表明,所提出的算法能够对5类心电节拍进行准确分类,分类效率达99.11%。  相似文献   

8.
一种面向组学数据的中级融合分类方法   总被引:1,自引:0,他引:1  
目的对组学数据进行深入分析有助于推动医疗诊断等方面的研究。利用单一种类组学数据的分析方法无法解决某些复杂生物医学问题。为利用多种组学信息以解决复杂的生物医疗问题,本文提出一种中级融合分类方法。方法引入偏最小二乘法(partial least squares,PLS)分别对各种组学数据进行降维,然后利用支持向量机(support vector machine,SVM)对融合后的数据进行分类。结果"非小细胞肺癌与肾癌"和"结肠直肠癌与结肠直肠腺瘤"这两个组学数据集被用于测试本文方法的有效性。在这两个癌症组学数据集上的应用,体现出该方法不但能有效降低高维组学数据的维数,而且具有较高的分类准确率(接受者操作特征曲线下的面积达0.95以上)。结论本文提出的中级融合方法能够利用多种组学数据对癌症样本进行分类,可有效提高疾病诊断的准确率。  相似文献   

9.
对基因芯片表达谱的聚类分析有助于发现共表达的基因,而共表达的特性往往是共调控基因所拥有的性质。因此,对基因表达谱的准确聚类将有利于更加准确地发现基因之间的调控关系。本研究使用机器学习中的等度规映射、局部线性嵌入、拉普拉斯特征根映射等流形学习方法处理基因表达谱数据,得到非线性降维后的数据。在此基础上应用K均值聚类、模糊聚类、自组织映射神经网络等聚类方法,根据给定的阈值,从酵母基因表达数据的382个聚类结果中得到了117个共表达基因对,而从人类血清组织细胞的基因表达数据的132个聚类结果中得到了89个共表达基因对。使用的判别准则表明,基于流形学习的聚类方法与以往的方法相当,且能够被用以发现高维基因芯片表达数据中的低维的流形结构。  相似文献   

10.
目的 基因表达谱数据分析是生物信息学领域最重要的研究内容之一.其可实现对不同病理分型的肿瘤的正确分类,对肿瘤诊断和治疗具有重大意义.方法 本文应用压缩感知算法实现对胃癌基因表达谱数据的分类,运用训练数据构造冗余字典,采用随机分布的规范行矢量高斯矩阵构造感知矩阵,对训练数据和测试数据进行感知,利用正交l2-范数算法对基因表达谱数据进行重建,在变换域中采用近邻法测试判断数据类别,与样本的实际类别相比较.结果 实验结果表明,压缩感知算法与K均值聚类、SVM等其他分类算法相比有较高的分类正确率,且分类速度快,能避免特征选取的问题.结论 本文方法对疾病的临床诊断和生物信息学研究有重要的参考和借鉴作用.  相似文献   

11.
为了解决脑机接口(BCI)中不同意识任务下运动想象脑电信号的分类问题,提出了一种基于PCA及SVM的识别方法。针对Hilbert-Huang变换和AR模型提取的脑电信号特征,首先采用主成分分析PCA对高维特征向量进行降维处理,然后用支持向量机进行分类。最后将本方法分类结果和Fisher线性分类、概率神经网络分类结果进行比较。实验结果表明,该方法分类正确率较高,复杂度低,具有一定的有效性,可用于脑机接口中。  相似文献   

12.
Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. In cancer classification, available training data sets are generally of a fairly small sample size compared to the number of genes involved. Along with training data limitations, this constitutes a challenge to certain classification methods. Feature (gene) selection can be used to successfully extract those genes that directly influence classification accuracy and to eliminate genes which have no influence on it. This significantly improves calculation performance and classification accuracy. In this paper, correlation-based feature selection (CFS) and the Taguchi-genetic algorithm (TGA) method were combined into a hybrid method, and the K-nearest neighbor (KNN) with the leave-one-out cross-validation (LOOCV) method served as a classifier for eleven classification profiles to calculate the classification accuracy. Experimental results show that the proposed method reduced redundant features effectively and achieved superior classification accuracy. The classification accuracy obtained by the proposed method was higher in ten out of the eleven gene expression data set test problems when compared to other classification methods from the literature.  相似文献   

13.
Gene expression data are the representation of nonlinear interactions among genes and environmental factors. Computing analysis of these data is expected to gain knowledge of gene functions and disease mechanisms. Clustering is a classical exploratory technique of discovering similar expression patterns and function modules. However, gene expression data are usually of high dimensions and relatively small samples, which results in the main difficulty for the application of clustering algorithms. Principal component analysis (PCA) is usually used to reduce the data dimensions for further clustering analysis. While PCA estimates the similarity between expression profiles based on the Euclidean distance, which cannot reveal the nonlinear connections between genes. This paper uses nonlinear dimensionality reduction (NDR) as a preprocessing strategy for feature selection and visualization, and then applies clustering algorithms to the reduced feature spaces. In order to estimate the effectiveness of NDR for capturing biologically relevant structures, the comparative analysis between NDR and PCA is exploited to five real cancer expression datasets. Results show that NDR can perform better than PCA in visualization and clustering analysis of complex gene expression data.  相似文献   

14.
目的寻找与肿瘤相关的基因诊疗中差异表达基因提取的方法。方法将基因表达谱数据进行预处理,采用相对风险方法筛选出差异表达基因特征子集,计算其样本间距离,然后对特征基因加权排序和过滤冗余基因,最后应用分类器对卵巢癌基因数据集进行分析,测试该方法的有效性。结果选取20维特征基因,进行分类测试,当特征基因为3~5、7和12~20维时,分类准确率可以达到100%,假阳性率可以达到0,表现出较好的可靠性,能够有效地将2个样本类型分开。结论经分类器测试证明,分类精度高,效果优于使用传统的基因差异表达分析方法。  相似文献   

15.
OBJECTIVE: Medical data is often very high dimensional. Depending upon the use, some data dimensions might be more relevant than others. In processing medical data, choosing the optimal subset of features is such important, not only to reduce the processing cost but also to improve the usefulness of the model built from the selected data. This paper presents a data mining study of medical data with fuzzy modeling methods that use feature subsets selected by some indices/methods. METHODS: Specifically, three fuzzy modeling methods including the fuzzy k-nearest neighbor algorithm, a fuzzy clustering-based modeling, and the adaptive network-based fuzzy inference system are employed. For feature selection, a total of 11 indices/methods are used. Medical data mined include the Wisconsin breast cancer dataset and the Pima Indians diabetes dataset. The classification accuracy and computational time are reported. To show how good the best performer is, the globally optimal was also found by carrying out an exhaustive testing of all possible combinations of feature subsets with three features. RESULTS: For the Wisconsin breast cancer dataset, the best accuracy of 97.17% was obtained, which is only 0.25% lower than that was obtained by exhaustive testing. For the Pima Indians diabetes dataset, the best accuracy of 77.65% was obtained, which is only 0.13% lower than that obtained by exhaustive testing. CONCLUSION: This paper has shown that feature selection is important to mining medical data for reducing processing time and for increasing classification accuracy. However, not all combinations of feature selection and modeling methods are equally effective and the best combination is often data-dependent, as supported by the breast cancer and diabetes data analyzed in this paper.  相似文献   

16.
乳腺癌是全球女性癌症死亡的主要原因之一。现有诊断方法主要是医生通过乳腺癌观察组织病理学图像进行判断,不仅费时费力,而且依赖医生的专业知识和经验,使得诊断效率无法令人满意。针对以上问题,设计基于组织学图像的深度学习框架,以提高乳腺癌诊断准确性,同时减少医生的工作量。开发一个基于多网络特征融合和稀疏双关系正则化学习的分类模型:首先,通过子图像裁剪和颜色增强进行乳腺癌图像预处理;其次,使用深度学习模型中典型的3种深度卷积神经网络(InceptionV3、ResNet-50和VGG-16),提取乳腺癌病理图像的多网络深层卷积特征并进行特征融合;最后,通过利用两种关系(“样本-样本”和“特征-特征”关系)和lF正则化,提出一种有监督的双关系正则化学习方法进行特征降维,并使用支持向量机将乳腺癌病理图像区分为4类—正常、良性、原位癌和浸润性癌。实验中,通过使用ICIAR2018公共数据集中的400张乳腺癌病理图像进行验证,获得93%的分类准确性。融合多网络深层卷积特征可以有效地捕捉丰富的图像信息,而稀疏双关系正则化学习可以有效降低特征冗余并减少噪声干扰,有效地提高模型的分类性能。  相似文献   

17.

Objective

Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small.

Methods

This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians’ diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper.

Results

This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance.

Conclusion

This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches.  相似文献   

18.
目的乳腺癌的早期发现对患者意义重大。为帮助医生进行乳腺癌的早期检查和诊断,本文提出利用小波分析与图像纹理特征提取相结合的方法来提取乳腺X线图像微钙化点区域,在提高检查准确性的同时避免漏检误检。方法首先利用灰度共生矩阵所提取的能量、熵、对比度、相关性以及小波分解后得到的各层高频系数的方差、能量作为图像的特征向量,然后利用支持向量机进行训练建立最优分类模型。最后利用建立的最优分类模型实现乳腺X线图像微钙化点区域的提取并利用检出率和误检率对结果进行评估。结果使用临床数据进行验证,结果表明利用小波分析与图像纹理特征提取相结合的方法能有效提取乳腺图像中的微钙化点区域。结论基于小波分析和灰度纹理特征的乳腺X线图像微钙化点区域的提取方法比单一的图像纹理特征提取或小波分析等方法,提取的效果更好。另外,该方法设计简单,更易于实现乳腺癌的自动化诊断。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号