首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole‐genome/whole‐exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever‐increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state‐of‐the‐art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web‐based implementation of FATHMM, including a high‐throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk .  相似文献   

2.
The success rate of association studies can be improved by selecting better genetic markers for genotyping or by providing better leads for identifying pathogenic single nucleotide polymorphisms (SNPs) in the regions of linkage disequilibrium with positive disease associations. We have developed a novel algorithm to predict pathogenic single amino acid changes, either nonsynonymous SNPs (nsSNPs) or missense mutations, in conserved protein domains. Using a Bayesian framework, we found that the probability of a microbial missense mutation causing a significant change in phenotype depended on how much difference it made in several phylogenetic, biochemical, and structural features related to the single amino acid substitution. We tested our model on pathogenic allelic variants (missense mutations or nsSNPs) included in OMIM, and on the other nsSNPs in the same genes (from dbSNP) as the nonpathogenic variants. As a result, our model predicted pathogenic variants with a 10% false-positive rate. The high specificity of our prediction algorithm should make it valuable in genetic association studies aimed at identifying pathogenic SNPs.  相似文献   

3.
4.
Accounting for human polymorphisms predicted to affect protein function   总被引:24,自引:0,他引:24  
Ng PC  Henikoff S 《Genome research》2002,12(3):436-446
A major interest in human genetics is to determine whether a nonsynonymous single-base nucleotide polymorphism (nsSNP) in a gene affects its protein product and, consequently, impacts the carrier's health. We used the SIFT (Sorting Intolerant From Tolerant) program to predict that 25% of 3084 nsSNPs from dbSNP, a public SNP database, would affect protein function. Some of the nsSNPs predicted to affect function were variants known to be associated with disease. Others were artifacts of SNP discovery. Two reports have indicated that there are thousands of damaging nsSNPs in an individual's human genome; we find the number is likely to be much lower.  相似文献   

5.
The ability to predict the effect of nonsynonymous SNPs (nsSNPs) on protein function is important for the success of genetic disease association studies. Here we present a statistical geometry approach to nsSNP classification based on Delaunay tessellation, whereby the impact of nsSNPs on protein function is correlated with the change in the four-body statistical potential (DeltaQ) of the protein caused by the amino acid substitution. We observed that the DeltaQ of polymorphic proteins with disease-associated nsSNPs (daSNPs) was on average significantly lower than the DeltaQ of the proteins with neutral SNPs (ntSNPs). Clustering amino acid substitutions into conservative and nonconservative groups, and using a three-letter alphabet based on side-chain polarity showed significantly lower DeltaQ in nonconservative changes to daSNPs and when hydrophobic residues were substituted by charged or by polar residues. We also found that the daSNPs in the protein core caused much lower DeltaQ than surface daSNPs. This approach demonstrates a strong correlation between the computed DeltaQ and SNP classification. Integration of our approach with the existing models will help achieve a more precise recognition of nsSNPs that underlie polygenic diseases. All of the programs were written in Java and are available from the authors upon request.  相似文献   

6.
A brain-computer interface (BCI) system allows direct communication between the brain and the external world. Common spatial pattern (CSP) has been used effectively for feature extraction of data used in BCI systems. However, many studies show that the performance of a BCI system using CSP largely depends on the filter parameters. The filter parameters that yield most discriminating information vary from subject to subject and manually tuning of the filter parameters is a difficult and time-consuming exercise. In this paper, we propose a new automated filter tuning approach for motor imagery electroencephalography (EEG) signal classification, which automatically and flexibly finds the filter parameters for optimal performance. We have evaluated the performance of our proposed method on two public benchmark datasets. Compared to the existing conventional CSP approach, our method reduces the average classification error rate by 2.89% and 3.61% for BCI Competition III dataset IVa and BCI Competition IV dataset I, respectively. Moreover, our proposed approach also achieved lowest average classification error rate compared to state-of-the-art methods studied in this paper. Thus, our proposed method can be potentially used for developing improved BCI systems, which can assist people with disabilities to recover their environmental control. It can also be used for enhanced disease recognition such as epileptic seizure detection using EEG signals.
Graphical abstract ?
  相似文献   

7.
Most diseases, including those of genetic origin, express a continuum of severity. Clinical interventions for numerous diseases are based on the severity of the phenotype. Predicting severity due to genetic variants could facilitate diagnosis and choice of therapy. Although computational predictions have been used as evidence for classifying the disease relevance of genetic variants, special tools for predicting disease severity in large scale are missing. Here, we manually curated a dataset containing variants leading to severe and less severe phenotypes and studied the abilities of variation impact predictors to distinguish between them. We found that these tools cannot separate the two groups of variants. Then, we developed a novel machine‐learning‐based method, PON‐PS ( http://structure.bmc.lu.se/PON-PS ), for the classification of amino acid substitutions associated with benign, severe, and less severe phenotypes. We tested the method using an independent test dataset and variants in four additional proteins. For distinguishing severe and nonsevere variants, PON‐PS showed an accuracy of 61% in the test dataset, which is higher than for existing tolerance prediction methods. PON‐PS is the first generic tool developed for this task. The tool can be used together with other evidence for improving diagnosis and prognosis and for prioritization of preventive interventions, clinical monitoring, and molecular tests.  相似文献   

8.

Background

At the Division of Functional Genomics, Research Center for Bioscience and Technology, Tottori University, we have been making an effort to establish a genetic testing facility that can provide the same screening procedures conducted worldwide.

Methods

Direct Sequencing of PCR products is the main method to detect point mutations, small deletions and insertions. Multiplex Ligation-dependent Probe Amplification (MLPA) was used to detect large deletions or insertions. Expansion of the repeat was analyzed for triplet repeat diseases. Original primers were constructed for 41 diseases when the reported primers failed to amplify the gene. Prediction of functional effects of human nsSNPs (PolyPhen) was used for evaluation of novel mutations.

Results

From January 2000 to September 2013, a total of 1,006 DNA samples were subjected to genetic testing in the Division of Functional Genomics, Research Center for Bioscience and Technology, Tottori University. The hospitals that requested genetic testing were located in 43 prefectures in Japan and in 11 foreign countries. The genetic testing covered 62 diseases, and mutations were detected in 287 out of 1,006 with an average mutation detection rate of 24.7%. There were 77 samples for prenatal diagnosis. The number of samples has rapidly increased since 2010.

Conclusion

In 2013, the next-generation sequencers were introduced in our facility and are expected to provide more comprehensive genetic testing in the near future. Nowadays, genetic testing is a popular and powerful tool for diagnosis of many genetic diseases. Our genetic testing should be further expanded in the future.  相似文献   

9.
Numerous mismatch repair (MMR) gene variants have been identified in Lynch syndrome and other cancer patients, but knowledge about their pathogenicity is frequently missing. The diagnosis and treatment of patients would benefit from knowing which variants are disease related. Bioinformatic approaches are well suited to the problem and can handle large numbers of cases. Functional effects were revealed based on literature for 168 MMR missense variants. Performance of numerous prediction methods was tested with this dataset. Among the tested tools, only the results of tolerance prediction methods correlated to functional information, however, with poor performance. Therefore, a novel consensus-based predictor was developed. The novel prediction method, pathogenic-or-not mismatch repair (PON-MMR), achieved accuracy of 0.87 and Matthews correlation coefficient of 0.77 on the experimentally verified variants. When applied to 616 MMR cases with unknown effects, 81 missense variants were predicted to be pathogenic and 167 neutral. With PON-MMR, the number of MMR missense variants with unknown effect was reduced by classifying a large number of cases as likely pathogenic or benign. The results can be used, for example, to prioritize cases for experimental studies and assist in the classification of cases.  相似文献   

10.

Introduction

Single-nucleotide polymorphisms (SNPs) are biomarkers for exploring the genetic basis of many complex human diseases. The prediction of SNPs is promising in modern genetic analysis but it is still a great challenge to identify the functional SNPs in a disease-related gene. The computational approach has overcome this challenge and an increase in the successful rate of genetic association studies and reduced cost of genotyping have been achieved. The objective of this study is to identify deleterious non-synonymous SNPs (nsSNPs) associated with the COL1A1 gene.

Material and methods

The SNPs were retrieved from the Single Nucleotide Polymorphism Database (dbSNP). Using I-Mutant, protein stability change was calculated. The potentially functional nsSNPs and their effect on proteins were predicted by PolyPhen and SIFT respectively. FASTSNP was used for estimation of risk score.

Results

Our analysis revealed 247 SNPs as non-synonymous, out of which 5 nsSNPs were found to be least stable by I-Mutant 2.0 with a DDG value of > –1.0. Four nsSNPs, namely rs17853657, rs17857117, rs57377812 and rs1059454, showed a highly deleterious tolerance index score of 0.00 with a change in their physicochemical properties by the SIFT server. Seven nsSNPs, namely rs1059454, rs8179178, rs17853657, rs17857117, rs72656340, rs72656344 and rs72656351, were found to be probably damaging with a PSIC score difference between 2.0 and 3.5 by the PolyPhen server. Three nsSNPs, namely rs1059454, rs17853657 and rs17857117, were found to be highly polymorphic with a risk score of 3-4 with a possible effect of non-conservative change and splicing regulation by FASTSNP.

Conclusions

Three nsSNPs, namely rs1059454, rs17853657 and rs17857117, are potential functional polymorphisms that are likely to have a functional impact on the COL1A1 gene.  相似文献   

11.
Several computational methods have been developed for predicting the effects of rapidly expanding variation data. Comparison of the performance of tools has been very difficult as the methods have been trained and tested with different datasets. Until now, unbiased and representative benchmark datasets have been missing. We have developed a benchmark database suite, VariBench, to overcome this problem. VariBench contains datasets of experimentally verified high‐quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three‐dimensional structure), along with identifier mapping to relevant databases. VariBench contains the first benchmark datasets for variation effect analysis, a field which is of high importance and where many developments are currently going on. VariBench datasets can be used, for example, to test performance of prediction tools as well as to train novel machine learning‐based tools. New datasets will be included and the community is encouraged to submit high‐quality datasets to the service. VariBench is freely available at http://structure.bmc.lu.se/VariBench .  相似文献   

12.
13.
Mutations in the SMPX gene can disrupt the regular activity of the SMPX protein, which is involved in the hearing process. Recent reports showing a link between nonsynonymous single-nucleotide polymorphisms (nsSNPs) in SMPX and hearing loss, thus classifying deleterious SNPs in SMPX will be an uphill task before designing a more extensive population study. In this study, damaging nsSNPs of SMPX from the dbSNP database were identified by using 13 bioinformatics tools. Initially, the impact of nsSNPs in the SMPX gene were evaluated through different in silico predictors; and the deleterious convergent changes were analyzed by energy-minimization-guided residual network analysis. In addition, the pathogenic effects of mutations in SMPX-mediated protein–protein interactions were also characterized by structural modeling and binding energy calculations. A total of four mutations (N19D, A29T, K54N, and S71L) were found to be highly deleterious by all the tools, which are located at highly conserved regions. Furthermore, all four mutants showed structural alterations, and the communities of amino acids for mutant proteins were readily changed, compared to the wild-type. Among them, A29T (rs772775896) was revealed as the most damaging nsSNP, which caused significant structural deviation of the SMPX protein, as a result reducing the binding affinity to other functional partners. These findings reflect the computational insights into the deleterious role of nsSNPs in SMPX, which might be helpful for subjecting wet-lab confirmatory analysis.  相似文献   

14.
Newborn screening programs for severe metabolic disorders using tandem mass spectrometry are widely used. Medium-Chain Acyl-CoA dehydrogenase deficiency (MCADD) is the most prevalent mitochondrial fatty acid oxidation defect (1:15,000 newborns) and it has been proven that early detection of this metabolic disease decreases mortality and improves the outcome. In previous studies, data mining methods on derivatized tandem MS datasets have shown high classification accuracies. However, no machine learning methods currently have been applied to datasets based on non-derivatized screening methods. A dataset with 44,159 blood samples was collected using a non-derivatized screening method as part of a systematic newborn screening by the PCMA screening center (Belgium). Twelve MCADD cases were present in this partially MCADD-enriched dataset. We extended three data mining methods, namely C4.5 decision trees, logistic regression and ridge logistic regression, with a parameter and threshold optimization method and evaluated their applicability as a diagnostic support tool. Within a stratified cross-validation setting, a grid search was performed for each model for a wide range of model parameters, included variables and classification thresholds. The best performing model used ridge logistic regression and achieved a sensitivity of 100%, a specificity of 99.987% and a positive predictive value of 32% (recalibrated for a real population), obtained in a stratified cross-validation setting. These results were further validated on an independent test set. Using a method that combines ridge logistic regression with variable selection and threshold optimization, a significantly improved performance was achieved compared to the current state-of-the-art for derivatized data, while retaining more interpretability and requiring less variables. The results indicate the potential value of data mining methods as a diagnostic support tool.  相似文献   

15.
Nakken S  Alseth I  Rognes T 《Neuroscience》2007,145(4):1273-1279
Non-synonymous single nucleotide polymorphisms (nsSNPs) represent common genetic variation that alters encoded amino acids in proteins. All nsSNPs may potentially affect the structure or function of expressed proteins and could therefore have an impact on complex diseases. In an effort to evaluate the phenotypic effect of all known nsSNPs in human DNA repair genes, we have characterized each polymorphism in terms of different functional properties. The properties are computed based on amino acid characteristics (e.g. residue volume change); position-specific phylogenetic information from multiple sequence alignments and from prediction programs such as SIFT (Sorting Intolerant From Tolerant) and PolyPhen (Polymorphism Phenotyping). We provide a comprehensive, updated list of all validated nsSNPs from dbSNP (public database of human single nucleotide polymorphisms at National Center for Biotechnology Information, USA) located in human DNA repair genes. The list includes repair enzymes, genes associated with response to DNA damage as well as genes implicated with genetic instability or sensitivity to DNA damaging agents. Out of a total of 152 genes involved in DNA repair, 95 had validated nsSNPs in them. The fraction of nsSNPs that had high probability of being functionally significant was predicted to be 29.6% and 30.9%, by SIFT and PolyPhen respectively. The resulting list of annotated nsSNPs is available online (http://dna.uio.no/repairSNP), and is an ongoing project that will continue assessing the function of coding SNPs in human DNA repair genes.  相似文献   

16.
The immune activity of an antibody is directed against a specific region on its target antigen known as the epitope. Numerous immunodetection and immunotheraputics applications are based on the ability of antibodies to recognize epitopes. The detection of immunogenic regions is often an essential step in these applications. The experimental approaches used for detecting immunogenic regions are often laborious and resource-intensive. Thus, computational methods for the prediction of immunogenic regions alleviate this drawback by guiding the experimental procedures. In this work we developed a computational method for the prediction of immunogenic regions from either the protein three-dimensional structure or sequence when the structure is unavailable. The method implements a machine-learning algorithm that was trained to recognize immunogenic patterns based on a large benchmark dataset of validated epitopes derived from antigen structures and sequences. We compare our method to other available tools that perform the same task and show that it outperforms them.  相似文献   

17.
18.
Variations in mismatch repair (MMR) system genes are causative of Lynch syndrome and other cancers. Thousands of variants have been identified in MMR genes, but the clinical relevance is known for only a small proportion. Recently, the InSiGHT group classified 2,360 MMR variants into five classes. One‐third of variants, majority of which is nonsynonymous variants, remain to be of uncertain clinical relevance. Computational tools can be used to prioritize variants for disease relevance investigations. Previously, we classified 248 MMR variants as likely pathogenic and likely benign using PON‐MMR. We have developed a novel tool, PON‐MMR2, which is trained on a larger and more reliable dataset. In performance comparison, PON‐MMR2 outperforms both generic tolerance prediction methods as well as methods optimized for MMR variants. It achieves accuracy and MCC of 0.89 and 0.78, respectively, in cross‐validation and 0.86 and 0.69, respectively, on an independent test dataset. We classified 354 class 3 variants in InSiGHT database as well as all possible amino acid substitutions in four MMR proteins. Likely harmful variants mainly appear in the protein core, whereas likely benign variants are on the surface. PON‐MMR2 is a highly reliable tool to prioritize variants for functional analysis. It is freely available at http://structure.bmc.lu.se/PON‐MMR2/ .  相似文献   

19.
Variations in the gene encoding uridine diphosphate glucuronosyltransferase 1A1 (UGT1A1) are particularly important because they have been associated with hyperbilirubinemia in Gilbert's and Crigler–Najjar syndromes as well as with changes in drug metabolism. Several variants associated with these phenotypes are nonsynonymous single‐nucleotide polymorphisms (nsSNPs). Bioinformatics approaches have gained increasing importance in predicting the functional significance of these variants. This study was focused on the predictive ability of bioinformatics approaches to determine the pathogenicity of human UGT1A1 nsSNPs, which were previously characterized at the protein level by in vivo and in vitro studies. Using 16 Web algorithms, we evaluated 48 nsSNPs described in the literature and databases. Eight of these algorithms reached or exceeded 90% sensitivity and six presented a Matthews correlation coefficient above 0.46. The best‐performing method was MutPred, followed by Sorting Intolerant from Tolerant (SIFT). The prediction measures varied significantly when predictors such us SIFT, polyphen‐2, and Prediction of Pathological Mutations on Proteins were run with their native alignment generated by the tool, or with an input alignment that was strictly built with UGT1A1 orthologs and manually curated. Our results showed that the prediction performance of some methods based on sequence conservation analysis can be negatively affected when nsSNPs are positioned at the hypervariable or constant regions of UGT1A1 ortholog sequences.  相似文献   

20.
Cerebral edema contributes significantly to morbidity and mortality after brain injury and stroke. Aquaporin-4 (AQP4), a water channel expressed in astrocytes, plays a key role in brain water homeostasis. Genetic variants in other aquaporin family members have been associated with disease phenotypes. However, in human AQP4, only one non-synonymous single-nucleotide polymorphism (nsSNP) has been reported, with no characterization of protein function or disease phenotype. We analyzed DNA from an ethnically diverse cohort of 188 individuals to identify novel AQP4 variants. AQP4 variants were constructed by site-directed mutagenesis and expressed in cells. Water permeability assays in the cells were used to measure protein function. We identified 24 variants in AQP4 including four novel nsSNPs (I128T, D184E, I205L and M224T). We did not observe the previously documented M278T in our sample. The nsSNPs found were rare ( approximately 1-2% allele frequency) and heterozygous. Computational analysis predicted reduced function mutations. Protein expression and membrane localization were similar for reference AQP4 and the five AQP4 mutants. Cellular assays confirmed that four variant AQP4 channels reduced normalized water permeability to between 26 and 48% of the reference (P < 0.001), while the M278T mutation increased normalized water permeability (P < 0.001). We identified multiple novel AQP4 SNPs and showed that four nsSNPs reduced water permeability. The previously reported M278T mutation resulted in gain of function. Our experiments provide insight into the function of the AQP4 protein. These nsSNPs may have clinical implications for patients with cerebral edema and related disorders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号