首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The ability to predict the effect of nonsynonymous SNPs (nsSNPs) on protein function is important for the success of genetic disease association studies. Here we present a statistical geometry approach to nsSNP classification based on Delaunay tessellation, whereby the impact of nsSNPs on protein function is correlated with the change in the four-body statistical potential (DeltaQ) of the protein caused by the amino acid substitution. We observed that the DeltaQ of polymorphic proteins with disease-associated nsSNPs (daSNPs) was on average significantly lower than the DeltaQ of the proteins with neutral SNPs (ntSNPs). Clustering amino acid substitutions into conservative and nonconservative groups, and using a three-letter alphabet based on side-chain polarity showed significantly lower DeltaQ in nonconservative changes to daSNPs and when hydrophobic residues were substituted by charged or by polar residues. We also found that the daSNPs in the protein core caused much lower DeltaQ than surface daSNPs. This approach demonstrates a strong correlation between the computed DeltaQ and SNP classification. Integration of our approach with the existing models will help achieve a more precise recognition of nsSNPs that underlie polygenic diseases. All of the programs were written in Java and are available from the authors upon request.  相似文献   

2.
Accounting for human polymorphisms predicted to affect protein function   总被引:24,自引:0,他引:24  
Ng PC  Henikoff S 《Genome research》2002,12(3):436-446
A major interest in human genetics is to determine whether a nonsynonymous single-base nucleotide polymorphism (nsSNP) in a gene affects its protein product and, consequently, impacts the carrier's health. We used the SIFT (Sorting Intolerant From Tolerant) program to predict that 25% of 3084 nsSNPs from dbSNP, a public SNP database, would affect protein function. Some of the nsSNPs predicted to affect function were variants known to be associated with disease. Others were artifacts of SNP discovery. Two reports have indicated that there are thousands of damaging nsSNPs in an individual's human genome; we find the number is likely to be much lower.  相似文献   

3.
Variations are mostly due to nonsynonymous single nucleotide polymorphisms (nsSNPs), some of which are associated with certain diseases. Phenotypic effects of a large number of nsSNPs have not been characterized. Although several methods have been developed to predict the effects of nsSNPs as "disease" or "neutral," there is still a need for development of methods with improved prediction accuracies. We, therefore, developed a support vector machine (SVM) based method named Hansa which uses a novel set of discriminatory features to classify nsSNPs into disease (pathogenic) and benign (neutral) types. Validation studies on a benchmark dataset and further on an independent dataset of well-characterized known disease and neutral mutations show that Hansa outperforms the other known methods. For example, fivefold cross-validation studies using the benchmark HumVar dataset reveal that at the false positive rate (FPR) of 20% Hansa yields a true positive rate (TPR) of 82% that is about 10% higher than the best-known method. Hansa is available in the form of a web server at http://hansa.cdfd.org.in:8080.  相似文献   

4.
Mutations in the SMPX gene can disrupt the regular activity of the SMPX protein, which is involved in the hearing process. Recent reports showing a link between nonsynonymous single-nucleotide polymorphisms (nsSNPs) in SMPX and hearing loss, thus classifying deleterious SNPs in SMPX will be an uphill task before designing a more extensive population study. In this study, damaging nsSNPs of SMPX from the dbSNP database were identified by using 13 bioinformatics tools. Initially, the impact of nsSNPs in the SMPX gene were evaluated through different in silico predictors; and the deleterious convergent changes were analyzed by energy-minimization-guided residual network analysis. In addition, the pathogenic effects of mutations in SMPX-mediated protein–protein interactions were also characterized by structural modeling and binding energy calculations. A total of four mutations (N19D, A29T, K54N, and S71L) were found to be highly deleterious by all the tools, which are located at highly conserved regions. Furthermore, all four mutants showed structural alterations, and the communities of amino acids for mutant proteins were readily changed, compared to the wild-type. Among them, A29T (rs772775896) was revealed as the most damaging nsSNP, which caused significant structural deviation of the SMPX protein, as a result reducing the binding affinity to other functional partners. These findings reflect the computational insights into the deleterious role of nsSNPs in SMPX, which might be helpful for subjecting wet-lab confirmatory analysis.  相似文献   

5.
Nakken S  Alseth I  Rognes T 《Neuroscience》2007,145(4):1273-1279
Non-synonymous single nucleotide polymorphisms (nsSNPs) represent common genetic variation that alters encoded amino acids in proteins. All nsSNPs may potentially affect the structure or function of expressed proteins and could therefore have an impact on complex diseases. In an effort to evaluate the phenotypic effect of all known nsSNPs in human DNA repair genes, we have characterized each polymorphism in terms of different functional properties. The properties are computed based on amino acid characteristics (e.g. residue volume change); position-specific phylogenetic information from multiple sequence alignments and from prediction programs such as SIFT (Sorting Intolerant From Tolerant) and PolyPhen (Polymorphism Phenotyping). We provide a comprehensive, updated list of all validated nsSNPs from dbSNP (public database of human single nucleotide polymorphisms at National Center for Biotechnology Information, USA) located in human DNA repair genes. The list includes repair enzymes, genes associated with response to DNA damage as well as genes implicated with genetic instability or sensitivity to DNA damaging agents. Out of a total of 152 genes involved in DNA repair, 95 had validated nsSNPs in them. The fraction of nsSNPs that had high probability of being functionally significant was predicted to be 29.6% and 30.9%, by SIFT and PolyPhen respectively. The resulting list of annotated nsSNPs is available online (http://dna.uio.no/repairSNP), and is an ongoing project that will continue assessing the function of coding SNPs in human DNA repair genes.  相似文献   

6.
7.
With the recognition of the importance of computational approach for protein-protein interaction prediction, many techniques have been developed to computationally predict protein-protein interactions. However, few techniques are actually implemented and announced in service form for general users to readily access and use the techniques. In this paper, we design and implement a protein interaction prediction service system based on the domain combination based protein-protein interaction prediction technique, which is known to show superior accuracy to other conventional computational protein-protein interaction prediction methods. In the prediction accuracy test of the method, high sensitivity (77%) and specificity (95%) are achieved for test protein pairs containing common domains with learning sets of proteins in a Yeast. The stability of the method is also manifested through the testing over DIP CORE, HMS-PCI, and TAP data. The functions of the system are divided into core, subsidiary, and general service function categories. The core function category includes the functions that can be provided only by using the domain combination based protein-protein interaction prediction method. Interaction prediction for a single protein pair and visualization of interaction probability distributions are the functions in this category. The subsidiary function category includes the functions that can be derived from the core functions. Domain combination pair search with high appearance probability and construction of protein interaction network are the functions in this category. Lastly, the general service function category includes the functions that can be implemented by collecting and organizing the protein and domain data in the Internet. Performance, openness and flexibility are the major design goals and they are achieved by adopting parallel execution techniques, Web Services standards, and layered architecture respectively. In this paper, several representative user interfaces of the system are also introduced with comprehensive usage guides.  相似文献   

8.
Surface protein and polymerase of hepatitis B virus provide a striking example of gene overlap. Inclusion of more coding constraints in the phylogenetic analysis forces the tree toward accepted topology. Three-dimensional protein modeling demonstrates that participation in local protein function underlies the observed mosaic patterns of amino acid conservation and variability. Conserved amino acid residues of polymerase were typically clustered at the catalytic core marked by the YMDD motif. The proposed tertiary structure of surface protein displayed the expected transmembrane helices in a 2-domain constellation. Conserved amino acids like, for instance, cysteine residues are involved in the spatial orientation of the two domains, the exposed location of the a-determinant and the dimer formation of surface protein. By means of computational alanine replacement scanning, we demonstrated that the interfaces between domains in monomeric surface protein, between the monomers in dimeric surface protein and in a capsid-surface protein complex mainly consist of relatively well-conserved amino acid residues.  相似文献   

9.
Cerebral edema contributes significantly to morbidity and mortality after brain injury and stroke. Aquaporin-4 (AQP4), a water channel expressed in astrocytes, plays a key role in brain water homeostasis. Genetic variants in other aquaporin family members have been associated with disease phenotypes. However, in human AQP4, only one non-synonymous single-nucleotide polymorphism (nsSNP) has been reported, with no characterization of protein function or disease phenotype. We analyzed DNA from an ethnically diverse cohort of 188 individuals to identify novel AQP4 variants. AQP4 variants were constructed by site-directed mutagenesis and expressed in cells. Water permeability assays in the cells were used to measure protein function. We identified 24 variants in AQP4 including four novel nsSNPs (I128T, D184E, I205L and M224T). We did not observe the previously documented M278T in our sample. The nsSNPs found were rare ( approximately 1-2% allele frequency) and heterozygous. Computational analysis predicted reduced function mutations. Protein expression and membrane localization were similar for reference AQP4 and the five AQP4 mutants. Cellular assays confirmed that four variant AQP4 channels reduced normalized water permeability to between 26 and 48% of the reference (P < 0.001), while the M278T mutation increased normalized water permeability (P < 0.001). We identified multiple novel AQP4 SNPs and showed that four nsSNPs reduced water permeability. The previously reported M278T mutation resulted in gain of function. Our experiments provide insight into the function of the AQP4 protein. These nsSNPs may have clinical implications for patients with cerebral edema and related disorders.  相似文献   

10.
Inferring domain-domain interactions from protein-protein interactions   总被引:21,自引:0,他引:21  
Deng M  Mehta S  Sun F  Chen T 《Genome research》2002,12(10):1540-1548
The interaction between proteins is one of the most important features of protein functions. Behind protein-protein interactions there are protein domains interacting physically with one another to perform the necessary functions. Therefore, understanding protein interactions at the domain level gives a global view of the protein interaction network, and possibly of protein functions. Two research groups used yeast two-hybrid assays to generate 5719 interactions between proteins of the yeast Saccharomyces cerevisiae. This allows us to study the large-scale conserved patterns of interactions between protein domains. Using evolutionarily conserved domains defined in a protein-domain database called PFAM (http://PFAM.wustl.edu), we apply a Maximum Likelihood Estimation method to infer interacting domains that are consistent with the observed protein-protein interactions. We estimate the probabilities of interactions between every pair of domains and measure the accuracies of our predictions at the protein level. Using the inferred domain-domain interactions, we predict interactions between proteins. Our predicted protein-protein interactions have a significant overlap with the protein-protein interactions (MIPS: http://mips.gfs.de) obtained by methods other than the two-hybrid assays. The mean correlation coefficient of the gene expression profiles for our predicted interaction pairs is significantly higher than that for random pairs. Our method has shown robustness in analyzing incomplete data sets and dealing with various experimental errors. We found several novel protein-protein interactions such as RPS0A interacting with APG17 and TAF40 interacting with SPT3, which are consistent with the functions of the proteins.  相似文献   

11.
We have developed a new computational algorithm for de novo identification of protein-ligand binding pockets and performed a large-scale validation of the algorithm on two systematically collected datasets from all crystallographic structures in the Protein Data Bank (PDB). This algorithm, called DrugSite, takes a three-dimensional protein structure as input and returns the location, volume and shape of the putative small molecule binding sites by using a physical potential and without any knowledge about a potential ligand molecule. We validated this method using 17,126 binding sites from complexes and apo-structures from the PDB. Out of 5,616 binding sites from protein-ligand complexes, 98.8% were identified by predicted pockets. In proteins having known binding sites, 80.9% were predicted by the largest predicted pocket and 92.7% by the first two. The average ratio of predicted contact area to the total surface area of the protein was 4.7% for the predicted pockets. In only 1.2% of the cases, no "pocket density" was found at the ligand location. Further, 98.6% of 11,510 binding sites collected from apo-structures were predicted. The algorithm is accurate and fast enough to predict protein-ligand binding sites of uncharacterized protein structures, suggest new allosteric druggable pockets, evaluate druggability of protein-protein interfaces and prioritize molecular targets by druggability. Furthermore, the known and the predicted binding pockets for the proteome of a particular organism can be clustered into a "pocketome", that can be used for rapid evaluation of possible binding partners of a given chemical compound.  相似文献   

12.
Variations in the gene encoding uridine diphosphate glucuronosyltransferase 1A1 (UGT1A1) are particularly important because they have been associated with hyperbilirubinemia in Gilbert's and Crigler–Najjar syndromes as well as with changes in drug metabolism. Several variants associated with these phenotypes are nonsynonymous single‐nucleotide polymorphisms (nsSNPs). Bioinformatics approaches have gained increasing importance in predicting the functional significance of these variants. This study was focused on the predictive ability of bioinformatics approaches to determine the pathogenicity of human UGT1A1 nsSNPs, which were previously characterized at the protein level by in vivo and in vitro studies. Using 16 Web algorithms, we evaluated 48 nsSNPs described in the literature and databases. Eight of these algorithms reached or exceeded 90% sensitivity and six presented a Matthews correlation coefficient above 0.46. The best‐performing method was MutPred, followed by Sorting Intolerant from Tolerant (SIFT). The prediction measures varied significantly when predictors such us SIFT, polyphen‐2, and Prediction of Pathological Mutations on Proteins were run with their native alignment generated by the tool, or with an input alignment that was strictly built with UGT1A1 orthologs and manually curated. Our results showed that the prediction performance of some methods based on sequence conservation analysis can be negatively affected when nsSNPs are positioned at the hypervariable or constant regions of UGT1A1 ortholog sequences.  相似文献   

13.
14.
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole‐genome/whole‐exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever‐increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state‐of‐the‐art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web‐based implementation of FATHMM, including a high‐throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk .  相似文献   

15.
The success rate of association studies can be improved by selecting better genetic markers for genotyping or by providing better leads for identifying pathogenic single nucleotide polymorphisms (SNPs) in the regions of linkage disequilibrium with positive disease associations. We have developed a novel algorithm to predict pathogenic single amino acid changes, either nonsynonymous SNPs (nsSNPs) or missense mutations, in conserved protein domains. Using a Bayesian framework, we found that the probability of a microbial missense mutation causing a significant change in phenotype depended on how much difference it made in several phylogenetic, biochemical, and structural features related to the single amino acid substitution. We tested our model on pathogenic allelic variants (missense mutations or nsSNPs) included in OMIM, and on the other nsSNPs in the same genes (from dbSNP) as the nonpathogenic variants. As a result, our model predicted pathogenic variants with a 10% false-positive rate. The high specificity of our prediction algorithm should make it valuable in genetic association studies aimed at identifying pathogenic SNPs.  相似文献   

16.
The problem of describing a protein representation by breaking up the amino acids atoms into functionally similar atom groups has been addressed by many researchers in the past 25 years. They have used a variety of physical, chemical and biological criteria of varying degrees of rigor to essentially impose our understanding of protein structures onto various atom-typing schemes used in studies of protein folding, protein-protein and protein-ligand interactions, and others. Here, instead, we have chosen to rely primarily on the data and use information-theoretic techniques to dissect it. We show that we can obtain an optimized protein representation for a given alphabet size from protein monomers or protein interface datasets that are in agreement with general concepts of protein energetics. Closer inspection of the atom partitions led to interesting observations pointing to the greater importance of the hydrophobic interactions in protein monomers compared to interfaces and, conversely, greater importance of polar/charged interaction in protein interfaces. Comparing the atom partitions from the two datasets we show that the two are strikingly similar at alphabet size of five, proving that despite some differences, the general energetic concepts are very similar for folding and binding. Implications for further structural studies are discussed.  相似文献   

17.
许多重要的细胞过程如信号转导、转运、细胞运动以及多数调节机制均由蛋白-蛋白之间的相可作用介导,蛋白质之间的相互作用在物理上是通过在两个相互作用蛋白之间形成接触面的短残基序列来实现。识别蛋白-蛋白相互作用位点,以及检测相互作用氨基酸残基之间的特异性与强度特异性,是一个具有重要应用前景的课题,它的应用范围从理性的药物设计到代谢和信号转导网络的分析。虽然有不少准确度不断提高的实验技术和计算方法来检测蛋白质之间的相互作用,但很少有方法能够精确地指出参与蛋白质相互作用的特定残基及其位置,而这些信息是将相互作用数据直接应用于药物开发所必需的。随着生物信息学和计算生物学的发展,通过研究已知蛋白-蛋白相互作用位点的这些不同特征.出现了一些利用序列与结构信息顶测蛋白-蛋白相互作用位点的计算方法。本文简要介绍了近年来在顶测蛋白-蛋白的相互作用位点方面取得一定进展的计算方法,包括基于基因组信息的计算方法、基于蛋白质初级序列的计算方法以及基于蛋白复合物结构信息的计算方法。虽然这些方法在过去儿年里取得了显著的进展,但是大多数在这方面的研究仍处于起步阶段.而现在数据库的不足和实验技术的缺陷对计算预测方法的进一步发展和公平性评价也存在着较大的影响,要提高蛋白-蛋白相巨作用位点预测的鲁棒性与可靠性,仍要有很多的工作要做。(发表在这里的是第一部分)  相似文献   

18.
Hahn CS  Cho YG  Kang BS  Lester IM  Hahn YS 《Virology》2000,276(1):127-137
Hepatitis C virus (HCV) is a major human pathogen causing mild to severe liver disease worldwide and is remarkably efficient at establishing persistent infections. Previously, we have shown that the core protein has an immunomodulatory function including the suppression of T lymphocyte responses to viral infection. To investigate the underlying mechanism for the role of core protein in immune modulation, we examined the effect of core on the sensitivity of the human T cell line, Jurkat, to Fas-mediated apoptosis. The transient and stable expression of core protein in Jurkat cells increased the sensitivity of cells to Fas-mediated apoptosis when compared to control cells expressing vector DNA alone. In addition, we demonstrated that the core protein binds to the cytoplasmic domain of Fas which may enhance the downstream signaling event of Fas-mediated apoptosis. The expression of core protein did not alter the cell surface expression of Fas, indicating that the increased sensitivity of core-expressing cells to Fas ligand was not due to upregulation of Fas. Furthermore, we observed the augmentation of caspase-3 activity in core-expressing cells. These results suggest that the core protein may promote the apoptosis of immune cells during HCV infection via the Fas signaling pathway, thus facilitating HCV persistence.  相似文献   

19.
Craniorachischisis (CRN) is a severe neural tube defect (NTD) resulting from failure to initiate closure, leaving the hindbrain and spinal neural tube entirely open. Clues to the genetic basis of this condition come from several mouse models, which harbor mutations in core members of the planar cell polarity (PCP) signaling pathway. Previous studies of humans with CRN failed to identify mutations in the core PCP genes, VANGL1 and VANGL2. Here, we analyzed other key PCP genes: CELSR1, PRICKLE1, PTK7, and SCRIB, with the finding of eight potentially causative mutations in both CELSR1 and SCRIB. Functional effects of these unique or rare human variants were evaluated using known protein-protein interactions as well as subcellular protein localization. While protein interactions were not affected, variants from five of the 36 patients exhibited a profound alteration in subcellular protein localization, with diminution or abolition of trafficking to the plasma membrane. Comparable effects were seen in the crash and spin cycle mouse Celsr1 mutants, and the line-90 mouse Scrib mutant. We conclude that missense variants in CELSR1 and SCRIB may represent a cause of CRN in humans, as in mice, with defective PCP protein trafficking to the plasma membrane a likely pathogenic mechanism.  相似文献   

20.
Proteins function mainly through interactions, especially with DNA and other proteins. While some large-scale interaction networks are now available for a number of model organisms, their experimental generation remains difficult. Consequently, interolog mapping--the transfer of interaction annotation from one organism to another using comparative genomics--is of significant value. Here we quantitatively assess the degree to which interologs can be reliably transferred between species as a function of the sequence similarity of the corresponding interacting proteins. Using interaction information from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Helicobacter pylori, we find that protein-protein interactions can be transferred when a pair of proteins has a joint sequence identity >80% or a joint E-value <10(-70). (These "joint" quantities are the geometric means of the identities or E-values for the two pairs of interacting proteins.) We generalize our interolog analysis to protein-DNA binding, finding such interactions are conserved at specific thresholds between 30% and 60% sequence identity depending on the protein family. Furthermore, we introduce the concept of a "regulog"--a conserved regulatory relationship between proteins across different species. We map interologs and regulogs from yeast to a number of genomes with limited experimental annotation (e.g., Arabidopsis thaliana) and make these available through an online database at http://interolog.gersteinlab.org. Specifically, we are able to transfer approximately 90,000 potential protein-protein interactions to the worm. We test a number of these in two-hybrid experiments and are able to verify 45 overlaps, which we show to be statistically significant.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号