首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 296 毫秒
1.
介绍了一种从蛋白质序列预测残基相对可溶性的新方法。该方法基于支持向量回归,并将序列局部信息作为输入。不同于先前的大部分预测方法仅对特定的蛋白残基相对可溶性进行状态分类,该方法预测了相对可溶性的连续值,从而比状态分类保留了蛋白质三维结构的更多信息。本研究对RS-126,Manesh-215和CB-513三个数据集进行了测试。通过比较不同的参数及窗宽模型来获得最佳结果,采用平均绝对误差、相关系数等参数来衡量预测效果,同时与多层反馈神经网络方法(RVP-Net)的实验结果比较,在3-fold情况下三个数据集预测结果的平均绝对误差均有降低,相关系数均有提高。另外,该算法采用了多序列比对作为输入,效果比单序列有所提高。采用该方法,对CB-513数据集平均绝对误差可以达到16.8%、相关系数为0.562,而用RVP-Net方法分别为18.8%和0.480。这些结论表明支持向量回归方法是蛋白质序列分析的一种有效工具。  相似文献   

2.
以氨基酸组成为特征对膜蛋白的分类,忽略了序列残基之间的相关性信息,而采用传统支持向量机算法作为分类算法,在解决多类问题时会出现分类盲区问题。针对这两种情况,计算蛋白质序列的氨基酸组成、二肽组成以及6种氨基酸相关系数,将三类特征结合,作为膜蛋白序列的特征向量;同时采用模糊支持向量机作为分类器,解决了传统支持向量机在多类数据识别中的盲区问题。测试结果表明,在相同特征输入下,模糊支持向量机分类性能优于传统支持向量机;在相同分类器的情况下,氨基酸组成、二肽组成和相关系数组合的特征选择方法的分类性能优于只使用其中一类或两类特征的方法;而采取组合特征和模糊支持向量机相结合的分类策略,在独立性数据集测试中的整体预测精度达到97%,优于现有的多种分类策略,是目前最有效的膜蛋白分类方法之一。  相似文献   

3.
基于多特征融合的蛋白质折叠子预测   总被引:1,自引:0,他引:1  
蛋白质折叠子预测为启发式搜索蛋白质三级结构提供了有用的信息.目前已知的折叠子预测方法大多数基于单种特征或多种特征的简单组合,本文采用一种多特征融合方法,从蛋白质的一级序列出发,对27类折叠子进行预测.使用支持向量机作为分类器,采用多对多的多类分类策略,以氨基酸组成成分、极性、极化性、范德瓦尔斯量、疏水性和预测的二级结构作为样本的六种特征,进行多特征融合,独立样本预测总精度为59.22%,与Ding等人的结果比较提高了3.2%,结果表明多特征融合方法是一种有效的蛋白质折叠子预测方法.  相似文献   

4.
蛋白质序列特征表示和机器学习算法是影响蛋白质结构类预测效果好坏的两个重要方面.本研究基于k-字统计频率和k-片段位置分布两种特征提取方法,将分别提取到的氨基酸序列信息和物理化学性质信息同蛋白质二级结构信息进行融合,建立17维和57维的特征信息集,并尝试在Adaboost.M1算法中引入Multi-Agent多智能体融合的思想,提出了一种Ma-Ada多分类器融合算法.该算法作为蛋白质结构类的预测工具,充分挖掘了单分类器度量层信息以及各个单分类器之间的交互融合信息.实验结果表明,Ma-Ada算法在Z277、Z498、1189和D640四个蛋白质数据集的57维特征信息集上的分类率分别达到了91.3%、96.8%、85.3%和87.2%,在17维特征信息集上的分类率也分别达到了90.6%、95.8%、84.8%和88.3%.与其它蛋白质结构类预测方法的结果相比,本方法能够获得较好的分类率.  相似文献   

5.
目的 准确预测蛋白质结构类,为研究其空间结构及生物功能打下基础.方法 应用隐马尔可夫模型(HMM)预测蛋白质结构类,分别构建3-状态HMM和8-状态HMM.数据来源于Chou和Zhou构建的蛋白质数据集,分别包含有204条蛋白质序列和498条蛋白质序列,通过留一法预测其准确率.结果 所构建的3-状态HMM和8-状态HMM对全α类的预测准确率最高,尤其是3-状态HMM的预测准确率达到95%以上.与Chou数据集相比,Zhou数据集对于全β类和α/β类的预测准确率也有所提高,同时,总体预测率也提高了2%左右;但α+β类的预测准确率有所下降.结论 将整条蛋白质序列作为预测模型的输入信息所构建的HMM模型能有效地预测蛋白质的结构类.  相似文献   

6.
利用Markov链模型对蛋白质可溶性特性进行统计建模,按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测。选择不同MCM阶数和分类阈值对数据进行训练和预测,以确保得到最好的分类效果。对两种数据集在不同分类阈值下进行分类预测,并将结果同其他已有方法如神经网络、信息论和支持向量机法等进行比较。该方法对蛋白质可溶性的预测精度和相关系数普遍好于或接近其他预测方法,其中对两类分类问题和三类分类问题的最优分类结果分别达到78.9%和67.7%。同时,该方法具有运算复杂度低、耗时短等优点。  相似文献   

7.
基于局部支持向量机的蛋白质相互作用的预测方法   总被引:3,自引:0,他引:3  
针对蛋白质相互作用的预测问题,我们提出了一种基于局部支持向量机的预测方法。该方法充分考虑了蛋白质相互作用数据的局部相似性特征,提出在待测样本附近构建支持向量机模型。对两个真实的蛋白质相互作用数据集H.pylori和Human的测试表明,基于局部支持向量机的预测方法能够有效剔除无用样本对待测样本的负面影响有效地提高了蛋白质相互作用预测的性能与其它方法相比具有一定的优势。  相似文献   

8.
寡聚蛋白质相对于单体蛋白质具有许多优势,广泛地参与多种生命活动。本文提出次生特征提取方法,使用支持向量机作为分类器,采用"一对一"的多类分类策略,基于蛋白质一级序列提取特征方法,对四类同源寡聚体进行分类研究。结果表明,在Jackknife检验下,基于次生特征和氨基酸组成成分特征构成的特征集,加权情况下,其总分类精度最高达到了78.41%,比氨基酸组成成分特征提高13.09%,比参考文献最好特征集BG提高了6.86%,比最好原生特征集CM1提高了5.53%。此结果说明次生特征提取方法对于蛋白质同源寡聚体分类是一种非常有效的特征提取方法。  相似文献   

9.
序列分类方法被广泛应用于各种生物信息学问题,例如转录调控元件识别和蛋白结构预测。本研究设计了一个新的基于序列特征的分类方法,并将其用于RNA剪接调控元件的研究。该方法从已知剪接元件中抽取序列特征,构建一个打分算法,由此预测未知元件RNA剪接调控功能。作为应用实例,采用已知外显子剪接增强子和沉默子(ESE和ESS)八联体作为实验数据,对本方法和若干已知常用方法的预测结果进行比较,3类计算验证实验中的平均预测精度为93%,表现出良好预测精度,且其透明的预测结构可帮助进行生物解释。该研究提供了一种可用于分析生物序列数据的新方法,给出了一个从生物信息学角度来研究基因调控问题的新途径。  相似文献   

10.
目的 预测蛋白质二级结构是预测其空间结构的基础,提高蛋白质二级结构的预测率非常重要.方法在本研究中,结合氨基酸的疏水性与含有进化信息的位置特异性得分矩阵(PSSM),构建BP神经网络.本文的数据来源于蛋白质数据集合CB513,在此集合中去除氨基酸个数小于30及含有X、B的序列,共492条蛋白序列作为数据集.通过4-交互验证预测准确率.在本研究中,将蛋白质二级结构预测的结果与仅用PSSM作为输入的神经网络预测相比较.结果 采用疏水性与进化信息相结合作为输入所构建的神经网络对α螺旋的预测准确率有了较大的提高,达到近79%,敏感性及特异性分别达到79%及91%.同时对二级结构总体预测准确率达到75.96%.结论 此种方法构建的BP网络能提高蛋白质二级结构,尤其是α螺旋的预测准确率.  相似文献   

11.
Protein domains contain information about the prediction of protein structure, function, evolution and design since the protein sequence may contain several domains with different or the same copies of the protein domain. In this study, we proposed an algorithm named SplitSSI-SVM that works with the following steps. First, the training and testing datasets are generated to test the SplitSSI-SVM. Second, the protein sequence is split into subsequence based on order and disorder regions. The protein sequence that is more than 600 residues is split into subsequences to investigate the effectiveness of the protein domain prediction based on subsequence. Third, multiple sequence alignment is performed to predict the secondary structure using bidirectional recurrent neural networks (BRNN) where BRNN considers the interaction between amino acids. The information of about protein secondary structure is used to increase the protein domain boundaries signal. Lastly, support vector machines (SVM) are used to classify the protein domain into single-domain, two-domain and multiple-domain. The SplitSSI-SVM is developed to reduce misleading signal, lower protein domain signal caused by primary structure of protein sequence and to provide accurate classification of the protein domain. The performance of SplitSSI-SVM is evaluated using sensitivity and specificity on single-domain, two-domain and multiple-domain. The evaluation shows that the SplitSSI-SVM achieved better results compared with other protein domain predictors such as DOMpro, GlobPlot, Dompred-DPS, Mateo, Biozon, Armadillo, KemaDom, SBASE, HMMPfam and HMMSMART especially in two-domain and multiple-domain.  相似文献   

12.
OBJECTIVE: In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. METHOD: In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. RESULTS: We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. CONCLUSION: From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.  相似文献   

13.
蛋白亚细胞定位的预测方法研究   总被引:2,自引:0,他引:2  
预测蛋白质的亚细胞定位信息对于了解其功能有重要的意义.选择氨基酸组成、氨基酸对组成、位置特异性打分矩阵三种分类特征以及模糊k近邻、支持向量机两种预测方法,分别进行了测试.对预测结果的分析显示,位置特异性打分矩阵可以提高对不同亚细胞器的可区分性;而支持向量机可以更好地利用位置特刎异性打分矩阵特征进行预测.使用氨基酸组成和位置特异性打分矩阵两种特征,并结合支持向量机,是一种有效的亚细胞定位预测方法.  相似文献   

14.
支持向量机规则提取在大脑胶质瘤诊断中的应用   总被引:1,自引:0,他引:1  
利用一种新型的数据挖掘技术一支持向量机从大脑胶质瘤病例中获取胶质瘤良恶性程度的诊断知识。所获取的胶质瘤数据集有280个病例,其中多项信息包含了模糊值,还有一项有缺失值,致使人工神经网络算法在学习时易于产生过拟合,而支持向量机实现了统计学习理论中的结构风险最小化原理,克服了过拟合问题,并且其分类面是一个线性超平面,有定量关系表达式,所以计算所得到的结果无论从测试样本的平均准确率,还是所获取知识的可理解性等方面,都优于常用的神经网络和规则提取方法。  相似文献   

15.
A new spike sorting method based on the support vector machine (SVM) is proposed to resolve the superposition problem. The spike superposition is generally resolved by the template matching. Previous template matching methods separate the spikes through linear classifiers. The classification performance is severely influenced by the background noise included in spike trains. The nonlinear classifiers with high generation ability are required to deal with the task. A multi-class SVM classifier is therefore applied to separate the spikes, which contains several binary SVM classifiers. Every binary SVM classifier corresponding to one spike class is used to identify the single and superposition spikes. The superposition spikes are decomposed through template extraction. The experimental results on the simulated and real data demonstrate the utility of the proposed method.  相似文献   

16.
Protein remote homology detection is a critical step toward annotating its structure and function. Supervised learning algorithms such as support vector machine are currently the most accurate methods. The position-specific score matrices (PSSMs) contain wealthy information about the evolutionary relationship of proteins. However, the PSSMs often have different lengths, which are difficult to be used by machine-learning methods. In this study, a simple, fast and powerful method is presented for protein remote homology detection, which combines support vector machine with auto-cross covariance transformation. The PSSMs are converted into a series of fixed-length vectors by auto-cross covariance transformation and these vectors are then input to a support vector machine classifier for remote homology detection. The sequence-order effects can be effectively captured by this scheme. Experiments are performed on well-established datasets, and the remote homology is simulated at the superfamily and the fold level, respectively. The results show that the proposed method, referred to as ACCRe, is comparable or even better than the state-of-the-art methods in terms of detection performance, and its time complexity is superior to those of other profile-based SVM methods. The auto-cross covariance transformation provides a novel way for the usage of evolutionary information, which can be widely used for protein-level studies.  相似文献   

17.
Prediction of protein-protein interactions is very important for several bioinformatics tasks though it is not a straightforward problem. In this paper, employing only protein sequence information, a framework is presented to predict protein-protein interactions using a probabilistic-based tree augmented nai ve (TAN) Bayesian network. Our framework also provides a confidence level for every predicted interaction, which is useful for further analysis by the biologists. The framework is applied to the yeast interaction datasets for predicting interactions and it is shown that our framework gives better performance than support vector machine (SVM). The framework is implemented as a webserver and is available for prediction.  相似文献   

18.
乳腺癌分子分型对乳腺癌的治疗具有决定性的参考作用,传统的分型方法有创且可能存在假阳性问题,而已有的基于影像学的分型方法准确率较低。本文提出一种利用迁移学习提取特征并结合支持向量机的分型预测方法,对乳腺癌PET/CT标记图像进行融合和归一化,再使用Xception迁移学习网络进行特征提取,最后使用支持向量机进行分类实现分型。对样本测试集进行性能评估表明,Xception+SVM模型的准确率达到0.687,AUC为0.787,优于现有基于影像学的方法,验证了本文方法的有效性。  相似文献   

19.
SNPs (Single Nucleotide Polymorphisms) include millions of changes in human genome, and therefore, are promising tools for disease-gene association studies. However, this kind of studies is constrained by the high expense of genotyping millions of SNPs. For this reason, it is required to obtain a suitable subset of SNPs to accurately represent the rest of SNPs. For this purpose, many methods have been developed to select a convenient subset of tag SNPs, but all of them only provide low prediction accuracy. In the present study, a brand new method is developed and introduced as GA–SVM with parameter optimization. This method benefits from support vector machine (SVM) and genetic algorithm (GA) to predict SNPs and to select tag SNPs, respectively. Furthermore, it also uses particle swarm optimization (PSO) algorithm to optimize C and γ parameters of support vector machine. It is experimentally tested on a wide range of datasets, and the obtained results demonstrate that this method can provide better prediction accuracy in identifying tag SNPs compared to other methods at present.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号