首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
In protein fold recognition, the main disadvantage of hidden Markov models (HMMs) is the employment of large-scale model architectures which require large data sets and high computational resources for training. Also, HMMs must consider sequential information about secondary structures of proteins, to improve prediction performance and reduce model parameters. Therefore, we propose a novel method for protein fold recognition based on a hidden Markov model, called a 9-state HMM. The method can (i) reduce the number of states using secondary structure information about proteins for each fold and (ii) recognize protein folds more accurately than other HMMs.  相似文献   

2.
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.  相似文献   

3.
Methods for optimizing the structure alphabet sequences of proteins   总被引:1,自引:0,他引:1  
Protein structure prediction based on fragment assemble has made great progress in recent years. Local protein structure prediction is receiving increased attention. One essential step of local protein structure prediction method is that the three-dimensional conformations must be compressed into one-dimensional series of letters of a structural alphabet. The traditional method assigns each structure fragment the structure alphabet that has the best local structure similarity. However, such locally optimal structure alphabet sequence does not guarantee to produce the globally optimal structure. This study presents two efficient methods trying to find the optimal structure alphabet sequence, which can model the native structures as accuracy as possible. First, a 28-letter structure alphabet is derived by clustering fragment in Cartesian space with fragment length of seven residues. The average quantization error of the 28 letters is 0.82 A in term of root mean square deviation. Then, two efficient methods are presented to encode the protein structures into series of structure alphabet letters, that is, the greedy and dynamic programming algorithm. They are tested on PDB database using the structure alphabet developed in Cartesian coordinates space (our structure alphabet) and in torsion angles space (the PB structure alphabet), respectively. The experimental results show that these two methods can find the approximately optimal structure alphabet sequences by searching a small fraction of the modeling space. The traditional local-optimization method achieves 26.27 A root mean square deviations between the reconstructed structures and the native one, while the modeling accuracy is improved to 3.28 A by the greedy algorithm. The results are helpful for local protein structure prediction.  相似文献   

4.
A numerical model able to investigate the influence of biomechanical factors on the long-term secondary stability of implants would be extremely useful for the design of new cementless prosthetic devices. A purely biomechanical model of osseo-integration has been developed, formulated as a rule-based adaptation scheme. Due to its complexity, the problem was divided into three steps: preliminary implementation of the model (proof of concept); implementation of the complete model and investigation of the model solution; and model validation. The paper describes the first of these three steps. The model was implemented as a discretestates machine, and the few parameters required were derived from the literature. It was then applied to a real clinical case. The study was conducted using the frictional contact finite element model of a human femur implanted with a cementless anatomical stem. A stable solution was achieved after between three and 15 iterations for all initial positions considered. Similar initial conditions yielded similar final configurations. The model predicted all initial configurations, with the exception of a partial osseo-integration, ranging between 62% (distal fit) and 78% (proximal fit) of the viable interface. This is in good agreement with the values reported in the literature that never exceed 75%, even in the best conditions, and report better clinical results for proximal fit. For the varus configuration, which lacks cortical support, the algorithm predicted a completed loosening.  相似文献   

5.
This work describes the use of a hidden Markov model (HMM), with a reduced number of states, which simultaneously learns amino acid sequence and secondary structure for proteins of known three-dimensional structure and it is used for two tasks: protein class prediction and fold recognition. The Protein Data Bank and the annotation of the SCOP database are used for training and evaluation of the proposed HMM for a number of protein classes and folds. Results demonstrate that the reduced state-space HMM performs equivalently, or even better in some cases, on classifying proteins than a HMM trained with the amino acid sequence. The major advantage of the proposed approach is that a small number of states is employed and the training algorithm is of low complexity and thus relatively fast.  相似文献   

6.
We introduce a new method for splicing sites prediction based on the theory of support vector machines (SVM). The SVM represents a new approach to supervised pattern classification and has been successfully applied to a wide range of pattern recognition problems. In the process of splicing sites prediction, the statistical information of RNA secondary structure in the vicinity of splice sites, e.g. donor and acceptor sites, is introduced in order to compare recognition ratio of true positive and true negative. From the results of comparison, addition of structural information has brought no significant benefit for the recognition of splice sites and had even lowered the rate of recognition. Our results suggest that, through three cross validation, the SVM method can achieve a good performance for splice sites identification.  相似文献   

7.
Methods for predicting protein secondary structures provide information that is useful both in ab initio structure prediction and as additional restraints for fold recognition algorithms. Secondary structure predictions may also be used to guide the design of site directed mutagenesis studies, and to locate potential functionally important residues. In this article, we propose a multi-modal back propagation neural network (MMBP) method for predicting protein secondary structures. Using a Knowledge Discovery Theory based on Inner Cognitive Mechanism (KDTICM) method, we have constructed a compound pyramid model (CPM), which is composed of three layers of intelligent interface that integrate multi-modal back propagation neural network (MMBP), mixed-modal SVM (MMS), modified Knowledge Discovery in Databases (KDD?) process and so on. The CPM method is both an integrated web server and a standalone application that exploits recent advancements in knowledge discovery and machine learning to perform very accurate protein secondary structure predictions. Using a non-redundant test dataset of 256 proteins from RCASP256, the CPM method achieves an average Q3 score of 86.13% (SOV99=84.66%). Extensive testing indicates that this is significantly better than any other method currently available. Assessments using RS126 and CB513 datasets indicate that the CPM method can achieve average Q3 score approaching 83.99% (SOV99=80.25%) and 85.58% (SOV99=81.15%). By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called CPM, which performs these secondary structure predictions, is accessible at http://kdd.ustb.edu.cn/protein_Web/.  相似文献   

8.
目的 预测TRAM蛋白的二级结构和B细胞表位,为抗鼠TRAM单克隆抗体制备奠定基础。方法以TRAM蛋白的氨基酸序列为基础,采用Goldkeu计算机分析软件以及网络nnpredict二级结构分析软件对TRAM蛋白二级结构及B细胞表位预测。合成针对该表位的多肽,以此多肽为免疫原免疫兔,对其免疫原性进行检测。结果用多参数预测TRAM蛋白的二级结构和B细胞表位,综合评判表明:TRAM分子的第216~229位氨基酸满足亲水性、可及性和可塑性,在二级结构上位于蛋白伸展结构或无规则卷曲结构内,最可能为其优势B细胞表位。此多肽能诱导机体产生较高的抗体滴度,多克隆抗体具有高的特异性。结论TRAM分子的第216~229位氨基酸为其优势B细胞表位,这为制作B细胞优势短肽单克隆抗体提供了理论依据。  相似文献   

9.
通过建立一个以系统和智能方式对胃癌癌前病变进行分类的模型,帮助医生找到敏感点和癌前息肉。在本文方法中,通过设计一种改进的ALexNet架构并使用数据增强、高斯噪声、L2权值衰减和ReLU等技术训练卷积神经网络模型,最后通过利用精度、损失值和混淆矩阵等性能指标对该模型的性能进行评估。在3 677张糜烂、息肉和溃疡等胃病图像上对所提出的模型进行测试,结果表明该模型的分类准确率达到89%。  相似文献   

10.
目的预测H5N1亚型高致病性禽流感病毒HA蛋白和NA蛋白的B细胞表位,为基于B细胞表位的预防性疫苗设计提供依据。方法基于HA蛋白和NA蛋白的蛋白质序列,采用Kyte-Doolittle的亲水性方案,Emini方案,Karplus方案和Jameson-wolf抗原指数方案,并辅以MAGE蛋白的二级结构柔性区域分析,预测HA蛋白和NA蛋白的B细胞表位。结果分别预测出了6条血凝素蛋白(Hemagglutinin,HA)以及6条神经氨酸酶(Neuraminidase,NA)B细胞优势表位。结论这些B细胞表位可为禽流感疫苗的研制提供实验依据。  相似文献   

11.
Acoustic analysis of voice features can complete the invasive observation-based methods for the diagnosis of vocal fold pathologies. Selection of an appropriate feature extraction method from the voice can significantly improve the diagnostic results for patients with vocal disorders. In this paper, the performance of nonlinear dynamics and acoustical perturbation features is evaluated in order to distinguish patients with vocal fold disorder and other normal cases. As a matter of fact, vocal fold pathology is one of the major causes of voice quality reduction or feature variation in patients with dysphonic voices. Due to the devastating impact of vocal folds dysfunction on the complex dynamical structure of the speech signals, spectral analysis methods are not suitable for characterizing such changes in disordered voices. Therefore, the using measures that can reflect the nonlinear nature of such changes in the acoustical signals is an efficient alternative for the conventional methods. In order to compare and contrast the effectiveness of such approaches, we exploit features such as correlation dimension, the largest Lyapunov exponent, approximate entropy, fractal dimension and Ziv-Lempel complexity, and we also evaluate their performance with respect to some conventional features like jitter and shimmer, in the voice diagnosis task. Using the support vector machine classifier, our simulation results show that correlation dimension and the largest Lyapunov exponent features with the highest recognition rates of 94.44% and 88.89% can be used as a highly reliable method for the clinical diagnosis of vocal folds pathologies and other relevant applications.  相似文献   

12.
Secondary structure of glycogen phosphorylase from Escherichia coli has been deduced using Chou-Fasman analysis. Out of 809 amino acid residues, 244 residues showed formation of alpha-helix (30%), 218 residues beta-pleated sheet (27%) and 192 residues (24%) showed formation of reverse beta turn, distributed all over the sequence. There are total 27 alpha-helix and 31 beta-pleated sheets distributed all over the molecule. A structure consisting of three consecutive strands of beta-pleated sheets and two joining alpha-helix is predicted for the stretch of the primary sequence from residues 325 to 372, thus showing the presence of a Rossman fold super secondary structure. There is a tyrosine at position 350 in the super secondary structure, in the area to contain a reverse beta turn. Several amino acids pairs are present in the sequence having Rossman fold super secondary structure.  相似文献   

13.
14.
基于多特征融合的蛋白质折叠子预测   总被引:1,自引:0,他引:1  
蛋白质折叠子预测为启发式搜索蛋白质三级结构提供了有用的信息.目前已知的折叠子预测方法大多数基于单种特征或多种特征的简单组合,本文采用一种多特征融合方法,从蛋白质的一级序列出发,对27类折叠子进行预测.使用支持向量机作为分类器,采用多对多的多类分类策略,以氨基酸组成成分、极性、极化性、范德瓦尔斯量、疏水性和预测的二级结构作为样本的六种特征,进行多特征融合,独立样本预测总精度为59.22%,与Ding等人的结果比较提高了3.2%,结果表明多特征融合方法是一种有效的蛋白质折叠子预测方法.  相似文献   

15.
16.
目的:预测人Izumo蛋白的二级结构及B细胞抗原表位.方法:以人Izumo基因序列为基础,按Chou-Fasman和Gamier-Robson方法预测其编码蛋白的二级结构,采用Karplus-Schulz方法预测Izumo蛋白骨架区的柔韧性;按Kyte-Doolittle方法预测其亲水性、Emini方法预测蛋白质表面可能性及Jameson-Wolf方法预测抗原性指数.结果:Chou-Fasman及Gamier-Robson两种方法预测的结果均表明,Izumo蛋白含较多的α螺旋,蛋白第6~17、30~40、88~99、103~120、153~160、173~188、249~260、283~297、334~338和339~346区段可能是α螺旋中心,第21~25、198~200、245~248和320~323区段可能是β折叠中心.用Kyte-Doolittle、Emini和Jameson-wolf方法分别对Izumo蛋白B细胞抗原表位进行预测结果表明,蛋白质第36~42、62~66、94~99、118~122、129~132、151~154、161~164、173~177、205~208、212~216、256~265、271~276、283~288、314~318和336~350区段附近很可能为B细胞表位优势区域.结论:该研究结果有助于确定Izumo蛋白的B细胞优势表位及发挥免疫避孕的活性部位.  相似文献   

17.
目的 预测蛋白质二级结构是预测其空间结构的基础,提高蛋白质二级结构的预测率非常重要.方法在本研究中,结合氨基酸的疏水性与含有进化信息的位置特异性得分矩阵(PSSM),构建BP神经网络.本文的数据来源于蛋白质数据集合CB513,在此集合中去除氨基酸个数小于30及含有X、B的序列,共492条蛋白序列作为数据集.通过4-交互验证预测准确率.在本研究中,将蛋白质二级结构预测的结果与仅用PSSM作为输入的神经网络预测相比较.结果 采用疏水性与进化信息相结合作为输入所构建的神经网络对α螺旋的预测准确率有了较大的提高,达到近79%,敏感性及特异性分别达到79%及91%.同时对二级结构总体预测准确率达到75.96%.结论 此种方法构建的BP网络能提高蛋白质二级结构,尤其是α螺旋的预测准确率.  相似文献   

18.
Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) “Vex‐seq” challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence elements. Compared to other splicing defect prediction tools from the literature, our framework integrates secondary structure information in predicting variants that disrupt splicing regulatory elements (SREs). We applied our model to classify splice‐disrupting variants among 2,094 single‐nucleotide polymorphisms from the Exome Aggregation Consortium using model‐predicted changes in percent spliced in (ΔPSI) associated with tested variants. Benchmarking our model against widely used state‐of‐the‐art tools, we demonstrate that PEPSI achieves comparable performance in terms of sensitivity and precision. Moreover, we also show that using secondary structure context can help resolve several cases where changes in the counts of SREs do not correspond with the directionality of ΔPSI measured for tested variants.  相似文献   

19.
In this paper, an intelligent hyper framework is proposed to recognize protein folds from its amino acid sequence which is a fundamental problem in bioinformatics. This framework includes some statistical and intelligent algorithms for proteins classification. The main components of the proposed framework are the Fuzzy Resource-Allocating Network (FRAN) and the Radial Bases Function based on Particle Swarm Optimization (RBF-PSO). FRAN applies a dynamic method to tune up the RBF network parameters. Due to the patterns complexity captured in protein dataset, FRAN classifies the proteins under fuzzy conditions. Also, RBF-PSO applies PSO to tune up the RBF classifier. Experimental results demonstrate that FRAN improves prediction accuracy up to 51% and achieves acceptable multi-class results for protein fold prediction. Although RBF-PSO provides reasonable results for protein fold recognition up to 48%, it is weaker than FRAN in some cases. However the proposed hyper framework provides an opportunity to use a great range of intelligent methods and can learn from previous experiences. Thus it can avoid the weakness of some intelligent methods in terms of memory, computational time and static structure. Furthermore, the performance of this system can be enhanced throughout the system life-cycle.  相似文献   

20.
In an effort to find a structural explanation for the lack of direct transmission of scrapie from sheep to humans, secondary structure predictions are used to locate the segments of the prion sequence which may be involved in the transformation from the normal form of the prion protein, which has high helix content, to the pathogenic form, which has high beta-sheet content. The Chou-Fasman algorithm, which calculates propensities for both helix and sheet formation, was used to predict the secondary structures of the scrapie-resistant and the scrapie-susceptible variants of the ovine prion protein. The scrapie-susceptible variant, which has a glutamine at residue position 168 (human prion protein numbering), is predicted to have a propensity for sheet formation in that region of the molecule, while the scrapie-resistant variant, which has an arginine at position 168, does not. The valine at position 133, additionally present in the ovine variant which is the most susceptible to scrapie, is predicted to result in even more sheet formation. When the predicted secondary structure of the human prion protein is compared to those of the ovine prion protein variants, the human protein is found to be most similar to the scrapie-resistant variant. This result is proposed to provide a possible explanation for the observation that scrapie is not directly transmitted from sheep to humans.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号