首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 39 毫秒
1.
2.
Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) “Vex‐seq” challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence elements. Compared to other splicing defect prediction tools from the literature, our framework integrates secondary structure information in predicting variants that disrupt splicing regulatory elements (SREs). We applied our model to classify splice‐disrupting variants among 2,094 single‐nucleotide polymorphisms from the Exome Aggregation Consortium using model‐predicted changes in percent spliced in (ΔPSI) associated with tested variants. Benchmarking our model against widely used state‐of‐the‐art tools, we demonstrate that PEPSI achieves comparable performance in terms of sensitivity and precision. Moreover, we also show that using secondary structure context can help resolve several cases where changes in the counts of SREs do not correspond with the directionality of ΔPSI measured for tested variants.  相似文献   

3.
In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI‐5). In particular, we analyze prediction performances of our tools on five CAGI‐5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.  相似文献   

4.
BRCA1 and BRCA2 (BRCA1/2) germline variants disrupting the DNA protective role of these genes increase the risk of hereditary breast and ovarian cancers. Correct identification of these variants then becomes clinically relevant, because it may increase the survival rates of the carriers. Unfortunately, we are still unable to systematically predict the impact of BRCA1/2 variants. In this article, we present a family of in silico predictors that address this problem, using a gene‐specific approach. For each protein, we have developed two tools, aimed at predicting the impact of a variant at two different levels: Functional and clinical. Testing their performance in different datasets shows that specific information compensates the small number of predictive features and the reduced training sets employed to develop our models. When applied to the variants of the BRCA1/2 (ENIGMA) challenge in the fifth Critical Assessment of Genome Interpretation (CAGI 5) we find that these methods, particularly those predicting the functional impact of variants, have a good performance, identifying the large compositional bias towards neutral variants in the CAGI sample. This performance is further improved when incorporating to our prediction protocol estimates of the impact on splicing of the target variant.  相似文献   

5.
Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'kā‐jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue ( https://onlinelibrary.wiley.com/toc/10981004/2019/40/9 ) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org .  相似文献   

6.
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine‐beta‐synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.  相似文献   

7.
Classification of variants of unknown significance is a challenging technical problem in clinical genetics. As up to one‐third of disease‐causing mutations are thought to affect pre‐mRNA splicing, it is important to accurately classify splicing mutations in patient sequencing data. Several consortia and healthcare systems have conducted large‐scale patient sequencing studies, which discover novel variants faster than they can be classified. Here, we compare the advantages and limitations of several high‐throughput splicing assays aimed at mitigating this bottleneck, and describe a data set of ~5,000 variants that we analyzed using our Massively Parallel Splicing Assay (MaPSy). The Critical Assessment of Genome Interpretation group (CAGI) organized a challenge, in which participants submitted machine learning models to predict the splicing effects of variants in this data set. We discuss the winning submission of the challenge (MMSplice) which outperformed existing software. Finally, we highlight methods to overcome the limitations of MaPSy and similar assays, such as tissue‐specific splicing, the effect of surrounding sequence context, classifying intronic variants, synthesizing large exons, and amplifying complex libraries of minigene species. Further development of these assays will greatly benefit the field of clinical genetics, which lack high‐throughput methods for variant interpretation.  相似文献   

8.
Identification of pathogenic variants in monogenic diseases is an important aspect of diagnosis, genetic counseling, and prediction of disease severity. Pathogenic mechanisms involved include changes in gene expression, RNA processing, and protein translation. Variants affecting pre‐mRNA splicing are difficult to predict due to the complex mechanism of splicing regulation. A generic approach to systematically detect and characterize effects of sequence variants on splicing would improve current diagnostic practice. Here, it is shown that such approach is feasible by combining flanking exon RT‐PCR, sequence analysis of PCR products, and exon‐internal quantitative RT‐PCR for all coding exons. Application of this approach to one novel and six previously published variants in the acid‐alpha glucosidase (GAA) gene causing Pompe disease enabled detection of a total of 11 novel splicing events. Aberrant splicing included cryptic splice‐site usage, intron retention, and exon skipping. Importantly, the extent of leaky wild‐type splicing correlated with disease onset and severity. These results indicate that this approach enables sensitive detection and in‐depth characterization of variants affecting splicing, many of which are still unrecognized or poorly understood. The approach is generic and should be adaptable for application to other monogenic diseases to aid in improved diagnostics.  相似文献   

9.
The availability of disease‐specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI‐5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV‐disease relationships.  相似文献   

10.
11.
Many computational approaches estimate the effect of coding variants, but their predictions often disagree with each other. These contradictions confound users and raise questions regarding reliability. Performance assessments can indicate the expected accuracy for each method and highlight advantages and limitations. The Critical Assessment of Genome Interpretation (CAGI) community aims to organize objective and systematic assessments: They challenge predictors on unpublished experimental and clinical data and assign independent assessors to evaluate the submissions. We participated in CAGI experiments as predictors, using the Evolutionary Action (EA) method to estimate the fitness effect of coding mutations. EA is untrained, uses homology information, and relies on a formal equation: The fitness effect equals the functional sensitivity to residue changes multiplied by the magnitude of the substitution. In previous CAGI experiments (between 2011 and 2016), our submissions aimed to predict the protein activity of single mutants. In 2018 (CAGI5), we also submitted predictions regarding clinical associations, folding stability, and matching genomic data with phenotype. For all these diverse challenges, we used EA to predict the fitness effect of variants, adjusted to specifically address each question. Our submissions had consistently good performance, suggesting that EA predicts reliably the effects of genetic variants.  相似文献   

12.
The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α‐N‐acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population‐scale analysis of disease epidemiology and rare variant association analysis.  相似文献   

13.
It is possible to estimate the prior probability of pathogenicity for germline disease gene variants based on bioinformatic prediction of variant effect/s. However, routinely used approaches have likely led to the underestimation and underreporting of variants located outside donor and acceptor splice site motifs that affect messenger RNA (mRNA) processing. This review presents information about hereditary cancer gene germline variants, outside native splice sites, with experimentally validated splicing effects. We list 95 exonic variants that impact splicing regulatory elements (SREs) in BRCA1, BRCA2, MLH1, MSH2, MSH6, and PMS2. We utilized a pre‐existing large‐scale BRCA1 functional data set to map functional SREs, and assess the relative performance of different tools to predict effects of 283 variants on such elements. We also describe rare examples of intronic variants that impact branchpoint (BP) sites and create pseudoexons. We discuss the challenges in predicting variant effect on BP site usage and pseudoexonization, and suggest strategies to improve the bioinformatic prioritization of such variants for experimental validation. Importantly, our review and analysis highlights the value of considering impact of variants outside donor and acceptor motifs on mRNA splicing and disease causation.  相似文献   

14.
Pathogenic genetic variants often primarily affect splicing. However, it remains difficult to quantitatively predict whether and how genetic variants affect splicing. In 2018, the fifth edition of the Critical Assessment of Genome Interpretation proposed two splicing prediction challenges based on experimental perturbation assays: Vex‐seq, assessing exon skipping, and MaPSy, assessing splicing efficiency. We developed a modular modeling framework, MMSplice, the performance of which was among the best on both challenges. Here we provide insights into the modeling assumptions of MMSplice and its individual modules. We furthermore illustrate how MMSplice can be applied in practice for individual genome interpretation, using the MMSplice VEP plugin and the Kipoi variant interpretation plugin, which are directly applicable to VCF files.  相似文献   

15.
Alternative splicing can be disrupted by genetic variants that are related to diseases like cancers. Discovering the influence of genetic variations on the alternative splicing will improve the understanding of the pathogenesis of variants. Here, we developed a new approach, PredPSI‐SVR to predict the impact of variants on exon skipping events by using the support vector regression. From the sequence of a particular exon and its flanking regions, 42 comprehensive features related to splicing events were extracted. By using a greedy feature selection algorithm, we found eight features contributing most to the prediction. The trained model achieved a Pearson correlation coefficient (PCC) of 0.570 in the 10‐fold cross‐validation based on the training data set provided by the “vex‐seq” challenge of the 5th Critical Assessment of Genome Interpretation. In the blind test also held by the challenge, our prediction ranked the 2nd with a PCC of 0.566 that demonstrates the robustness of our method. A further test indicated that the PredPSI‐SVR is helpful in prioritizing deleterious synonymous mutations. The method is available on https://github.com/chenkenbio/PredPSI‐SVR .  相似文献   

16.
The CAGI‐5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural‐network‐based method able to discriminate between neutral and non‐neutral single nucleotide polymorphisms. The CAGI‐5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.  相似文献   

17.
Improving predictions of phenotypic consequences for genomic variants is part of ongoing efforts in the scientific community to gain meaningful insights into genomic function. Within the framework of the critical assessment of genome interpretation experiments, we participated in the Vex‐seq challenge, which required predicting the change in the percent spliced in measure (ΔΨ) for 58 exons caused by more than 1,000 genomic variants. Experimentally determined through the Vex‐seq assay, the Ψ quantifies the fraction of reads that include an exon of interest. Predicting the change in Ψ associated with specific genomic variants implies determining the sequence changes relevant for splicing regulators, such as splicing enhancers and silencers. Here we took advantage of two computational tools, SplicePort and SPANR, that incorporate relevant sequence features in their models of splice sites and exon‐inclusion level, respectively. Specifically, we used the SplicePort and SPANR outputs to build mathematical models of the experimental data obtained for the variants in the training set, which we then used to predict the ΔΨ associated with the mutations in the test set. We show that the sequence changes captured by these computational tools provide a reasonable foundation for modeling the impact on splicing associated with genomic variants.  相似文献   

18.
19.
The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI‐5 challenge assessing the 38 missense variants affecting the human Pericentriolar material 1 protein (PCM1), our SNAP‐based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI‐5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in nonconserved positions. They were also better able to call effect variants in a structurally rich region than in a less‐structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation—a feature that likely makes it a harder problem for generalized variant effect predictors.  相似文献   

20.
Reliable methods for predicting functional consequences of variants in disease genes would be beneficial in the clinical setting. This study was undertaken to predict, and confirm in vitro, splicing aberrations associated with mismatch repair (MMR) variants identified in familial colon cancer patients. Six programs were used to predict the effect of 13 MLH1 and 6 MSH2 gene variants on pre‐mRNA splicing. mRNA from cycloheximide‐treated lymphoblastoid cell lines of variant carriers was screened for splicing aberrations. Tumors of variant carriers were tested for microsatellite instability and MMR protein expression. Variant segregation in families was assessed using Bayes factor causality analysis. Amino acid alterations were examined for evolutionary conservation and physicochemical properties. Splicing aberrations were detected for 10 variants, including a frameshift as a minor cDNA product, and altered ratio of known alternate splice products. Loss of splice sites was well predicted by splice‐site prediction programs SpliceSiteFinder (90%) and NNSPLICE (90%), but consequence of splice site loss was less accurately predicted. No aberrations correlated with ESE predictions for the nine exonic variants studied. Seven of eight missense variants had normal splicing (88%), but only one was a substitution considered neutral from evolutionary/physicochemical analysis. Combined with information from tumor and segregation analysis, and literature review, 16 of 19 variants were considered clinically relevant. Bioinformatic tools for prediction of splicing aberrations need improvement before use without supporting studies to assess variant pathogenicity. Classification of mismatch repair gene variants is assisted by a comprehensive approach that includes in vitro, tumor pathology, clinical, and evolutionary conservation data. Hum Mutat 0, 1–14, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号