首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several computational methods have been developed for predicting the effects of rapidly expanding variation data. Comparison of the performance of tools has been very difficult as the methods have been trained and tested with different datasets. Until now, unbiased and representative benchmark datasets have been missing. We have developed a benchmark database suite, VariBench, to overcome this problem. VariBench contains datasets of experimentally verified high‐quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three‐dimensional structure), along with identifier mapping to relevant databases. VariBench contains the first benchmark datasets for variation effect analysis, a field which is of high importance and where many developments are currently going on. VariBench datasets can be used, for example, to test performance of prediction tools as well as to train novel machine learning‐based tools. New datasets will be included and the community is encouraged to submit high‐quality datasets to the service. VariBench is freely available at http://structure.bmc.lu.se/VariBench .  相似文献   

2.
Whole‐genome sequencing (WGS) studies are uncovering disease‐associated variants in both rare and nonrare diseases. Utilizing the next‐generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline—gNOME—to prioritize phenotype‐associated variants while minimizing false‐positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype‐associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole‐exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss‐of‐function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community ( http://gnome.tchlab.org ).  相似文献   

3.
Next‐generation sequencing (NGS) has become a powerful and efficient tool for routine mutation screening in clinical research. As each NGS test yields hundreds of variants, the current challenge is to meaningfully interpret the data and select potential candidates. Analyzing each variant while manually investigating several relevant databases to collect specific information is a cumbersome and time‐consuming process, and it requires expertise and familiarity with these databases. Thus, a tool that can seamlessly annotate variants with clinically relevant databases under one common interface would be of great help for variant annotation, cross‐referencing, and visualization. This tool would allow variants to be processed in an automated and high‐throughput manner and facilitate the investigation of variants in several genome browsers. Several analysis tools are available for raw sequencing‐read processing and variant identification, but an automated variant filtering, annotation, cross‐referencing, and visualization tool is still lacking. To fulfill these requirements, we developed DaMold, a Web‐based, user‐friendly tool that can filter and annotate variants and can access and compile information from 37 resources. It is easy to use, provides flexible input options, and accepts variants from NGS and Sanger sequencing as well as hotspots in VCF and BED formats. DaMold is available as an online application at http://damold.platomics.com/index.html , and as a Docker container and virtual machine at https://sourceforge.net/projects/damold/ .  相似文献   

4.
Clinical mutation screening of the cancer susceptibility genes BRCA1 and BRCA2 generates many unclassified variants (UVs). Most of these UVs are either rare missense substitutions or nucleotide substitutions near the splice junctions of the protein coding exons. Previously, we developed a quantitative method for evaluation of BRCA gene UVs—the “integrated evaluation”—that combines a sequence analysis‐based prior probability of pathogenicity with patient and/or tumor observational data to arrive at a posterior probability of pathogenicity. One limitation of the sequence analysis‐based prior has been that it evaluates UVs from the perspective of missense substitution severity but not probability to disrupt normal mRNA splicing. Here, we calibrated output from the splice‐site fitness program MaxEntScan to generate spliceogenicity‐based prior probabilities of pathogenicity for BRCA gene variants; these range from 0.97 for variants with high probability to damage a donor or acceptor to 0.02 for exonic variants that do not impact a splice junction and are unlikely to create a de novo donor. We created a database http://priors.hci.utah.edu/PRIORS/ that provides the combined missense substitution severity and spliceogenicity‐based probability of pathogenicity for BRCA gene single‐nucleotide substitutions. We also updated the BRCA gene Ex‐UV LOVD, available at http://hci‐exlovd.hci.utah.edu, with 77 re‐evaluable variants.  相似文献   

5.
Brazilians are highly admixed with ancestry from Europe, Africa, America, and Asia and yet still underrepresented in genomic databanks. We hereby present a collection of exomic variants from 609 elderly Brazilians in a census‐based cohort (SABE609) with comprehensive phenotyping. Variants were deposited in ABraOM (Online Archive of Brazilian Mutations), a Web‐based public database. Population representative phenotype and genotype repositories are essential for variant interpretation through allele frequency filtering; since elderly individuals are less likely to harbor pathogenic mutations for early‐ and adult‐onset diseases, such variant databases are of great interest. Among the over 2.3 million variants from the present cohort, 1,282,008 were high‐confidence calls. Importantly, 207,621 variants were absent from major public databases. We found 9,791 potential loss‐of‐function variants with about 300 mutations per individual. Pathogenic variants on clinically relevant genes (ACMG) were observed in 1.15% of the individuals and were correlated with clinical phenotype. We conducted incidence estimation for prevalent recessive disorders based upon heterozygous frequency and concluded that it relies on appropriate pathogenicity assertion. These observations illustrate the relevance of collecting demographic data from diverse, poorly characterized populations. Census‐based datasets of aged individuals with comprehensive phenotyping are an invaluable resource toward the improved understanding of variant pathogenicity.  相似文献   

6.
Discriminating which nucleotide variants cause disease or contribute to phenotypic traits remains a major challenge in human genetics. In theory, any intragenic variant can potentially affect RNA splicing by altering splicing regulatory elements (SREs). However, these alterations are often ignored mainly because pioneer SRE predictors have proved inefficient. Here, we report the first large‐scale comparative evaluation of four user‐friendly SRE‐dedicated algorithms (QUEPASA, HEXplorer, SPANR, and HAL) tested both as standalone tools and in multiple combined ways based on two independent benchmark datasets adding up to >1,300 exonic variants studied at the messenger RNA level and mapping to 89 different disease‐causing genes. These methods display good predictive power, based on decision thresholds derived from the receiver operating characteristics curve analyses, with QUEPASA and HAL having the best accuracies either as standalone or in combination. Still, overall there was a tight race between the four predictors, suggesting that all methods may be of use. Additionally, QUEPASA and HEXplorer may be beneficial as well for predicting variant‐induced creation of pseudoexons deep within introns. Our study highlights the potential of SRE predictors as filtering tools for identifying disease‐causing candidates among the plethora of variants detected by high‐throughput DNA sequencing and provides guidance for their use in genomic medicine settings.  相似文献   

7.
MicroRNA (miRNA) expression is frequently deregulated in human disease, in contrast, disease‐associated miRNA mutations are understudied. We developed A nnotative D atabase of miR NA E lements, ADmiRE, which combines multiple existing and new biological annotations to aid prioritization of causal miRNA variation. We annotated 10,206 mature (3,257 within seed region) miRNA variants from multiple large sequencing datasets including gnomAD (15,496 genomes; 123,136 exomes). The pattern of miRNA variation closely resembles protein‐coding exonic regions, with no difference between intragenic and intergenic miRNAs (P = 0.56), and high confidence miRNAs demonstrate higher sequence constraint (P < 0.001). Conservation analysis across 100 vertebrates identified 765 highly conserved miRNAs that also have limited genetic variation in gnomAD. We applied ADmiRE to the TCGA PanCancerAtlas WES dataset containing over 10,000 individuals across 33 adult cancers and annotated 1,267 germline (rare in gnomAD) and 1,492 somatic miRNA variants. Several miRNA families with deregulated gene expression in cancer have low levels of both somatic and germline variants, e.g., let‐7 and miR‐10. In addition to known somatic miR‐142 mutations in hematologic cancers, we describe novel somatic miR‐21 mutations in esophageal cancers impacting downstream miRNA targets. Through the development of ADmiRE, we present a framework for annotation and prioritization of miRNA variation in disease datasets.  相似文献   

8.
9.
The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI‐5 challenge assessing the 38 missense variants affecting the human Pericentriolar material 1 protein (PCM1), our SNAP‐based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI‐5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in nonconserved positions. They were also better able to call effect variants in a structurally rich region than in a less‐structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation—a feature that likely makes it a harder problem for generalized variant effect predictors.  相似文献   

10.
Pathogenic genetic variants often primarily affect splicing. However, it remains difficult to quantitatively predict whether and how genetic variants affect splicing. In 2018, the fifth edition of the Critical Assessment of Genome Interpretation proposed two splicing prediction challenges based on experimental perturbation assays: Vex‐seq, assessing exon skipping, and MaPSy, assessing splicing efficiency. We developed a modular modeling framework, MMSplice, the performance of which was among the best on both challenges. Here we provide insights into the modeling assumptions of MMSplice and its individual modules. We furthermore illustrate how MMSplice can be applied in practice for individual genome interpretation, using the MMSplice VEP plugin and the Kipoi variant interpretation plugin, which are directly applicable to VCF files.  相似文献   

11.
Current genetic screening methods for inherited eye diseases are concentrated on the coding exons of known disease genes (gene panels, clinical exome). These tests have a variable and often limited diagnostic rate depending on the clinical presentation, size of the gene panel and our understanding of the inheritance of the disorder (with examples described in this issue). There are numerous possible explanations for the missing heritability of these cases including undetected variants within the relevant gene (intronic, up/down‐stream and structural variants), variants harbored in genes outside the targeted panel, intergenic variants, variants undetectable by the applied technology, complex/non‐Mendelian inheritance, and nongenetic phenocopies. In this article we further explore and review methods to investigate these sources of missing heritability.  相似文献   

12.
Lysosomal acid lipase (LAL) deficiency is an autosomal recessive disorder caused by LIPA gene mutations that disrupt LAL activity. We performed in vitro functional testing of 149 LIPA variants to increase the understanding of the variant effects on LAL deficiency and to improve disease prevalence estimates. Chosen variants had been reported in literature or population databases. Functional testing was done by plasmid transient transfection and LAL activity assessment. We assembled a set of 165 published LAL deficient patient genotypes to evaluate this assay's effectiveness to recapitulate genotype/phenotype relationships. Rapidly progressive LAL deficient patients showed negligible enzymatic activity (<1%), whereas patients with childhood/adult LAL deficiency typically have 1–7% average activity. We benchmarked six in silico variant effect prediction algorithms with these functional data. PolyPhen‐2 was shown to have a superior area under the receiver operating curve performance. We used functional data along with Genome Aggregation Database (gnomAD) allele frequencies to estimate LAL deficiency birth prevalence, yielding a range of 3.45–5.97 cases per million births in European‐ancestry populations. The low estimate only considers functionally assayed variants in gnomAD. The high estimate computes allele frequencies for variants absent in gnomAD, and uses in silico scores for unassayed variants. Prevalence estimates are lower than previously published, underscoring LAL deficiency's rarity.  相似文献   

13.
Genome sequencing is positioned as a routine clinical work‐up for diverse clinical conditions. A commonly used approach to highlight candidate variants with potential clinical implication is to search over locus‐ and gene‐centric knowledge databases. Most web‐based applications allow a federated query across diverse databases for a single variant; however, sifting through a large number of genomic variants with combination of filtering criteria is a substantial challenge. Here we describe the Clinical Genome and Ancestry Report (CGAR), an interactive web application developed to follow clinical interpretation workflows by organizing variants into seven categories: (1) reported disease‐associated variants, (2) rare‐ and high‐impact variants in putative disease‐associated genes, (3) secondary findings that the American College of Medical Genetics and Genomics recommends reporting back to patients, (4) actionable pharmacogenomic variants, (5) focused reports for candidate genes, (6) de novo variant candidates for trio analysis, and (7) germline and somatic variants implicated in cancer risk, diagnosis, treatment and prognosis. For each variant, a comprehensive list of external links to variant‐centric and phenotype databases are provided. Furthermore, genotype‐derived ancestral composition is used to highlight allele frequencies from a matched population since some disease‐associated variants show a wide variation between populations. CGAR is an open‐source software and is available at https://tom.tch.harvard.edu/apps/cgar/ .  相似文献   

14.
Null variants are prevalent within the human genome, and their accurate interpretation is critical for clinical management. In 2018, the ClinGen Sequence Variant Interpretation (SVI) Working Group refined the only criterion with a very strong pathogenicity rating (PVS1). To streamline PVS1 interpretation, we have developed an automatic classification tool with a graphical user interface called AutoPVS1. The performance of AutoPVS1 was assessed using 56 variants manually curated by the ClinGen's SVI Working Group; it achieved an interpretation concordance of 93% (52/56). A further analysis of 28,586 putative loss‐of‐function variants by AutoPVS1 demonstrated that at least 27.7% of them do not reach a very strong strength level, 17.5% because of variant‐specific issues and 10.2% due to disease mechanism considerations. Notably, 41.0% (1,936/4,717) of splicing variants were assigned a decreased preliminary PVS1 strength level, a significantly greater fraction than in frameshift variants (13.2%) and nonsense variants (10.8%). Our results reinforce the necessity of considering variant‐specific issues and disease mechanisms in variant interpretation and demonstrate that AutoPVS1 meets an urgent need by enabling biocurators to easily assign accurate, reliable and reproducible PVS1 strength levels in the process of variant interpretation. AutoPVS1 is publicly available at http://autopvs1.genetics.bgi.com/ .  相似文献   

15.
We have created KvDB: a voltage‐gated potassium (Kv) channel‐specific database that houses natural and experimental variant data and includes highly curated multiple sequence alignments and additional analytical tools, such as structural variant mapping and transmembrane segment prediction. KvDB is available at www.bioinformatics.leeds.ac.uk/KvDB . Analyzing the characterized gene variants in terms of topological location revealed the following. The S4, S4–S5, S5, S5–S6, and S6 segments are most likely to house disease‐causing variants. Neurological disorders are more likely to be caused by variants affecting voltage sensing, whereas cardiac disorders are more likely to be caused by variants in the pore. Long QT Syndrome 2 (LQT2) is more often caused by N‐terminus variation, a region containing a domain that affects deactivation, suggesting a potential disease mechanism. Conversely, a higher proportion of LQT1‐causing variants reside in S4–S5, suggesting communication of voltage‐sensing to the pore as a disease mechanism. By structurally mapping functionally characterized variants, we also provide mechanistic insight into Kv channel function; identifying an intersubunit interaction that may be partly responsible for setting activation voltage. Investigating phenotypically characterized variants that map to the same position as functionally characterized ones indicates only weak association between locations that cause disease and those that alter electrophysiological properties. Hum Mutat 31:1–10, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

16.
Current methods for resolving genetically distinct subclones in tumor samples require somatic mutations to be clustered by allelic frequencies, which are determined by applying a variant calling program to next‐generation sequencing data. Such programs were developed to accurately distinguish true polymorphisms and somatic mutations from the artifactual nonreference alleles introduced during library preparation and sequencing. However, numerous variant callers exist with no clear indication of the best performer for subclonal analysis, in which the accuracy of the assigned variant frequency is as important as correctly indicating whether the variant is present or not. Furthermore, sequencing depth (the number of times that a genomic position is sequenced) affects the ability to detect low‐allelic fraction variants and accurately assign their allele frequencies. We created two synthetic sequencing datasets, and sequenced real KRAS amplicons, with variants spiked in at specific ratios, to assess which caller performs best in terms of both variant detection and assignment of allelic frequencies. We also assessed the sequencing depths required to detect low‐allelic fraction variants. We found that VarScan2 performed best overall with sequencing depths of 100×, 250×, 500×, and 1,000× required to accurately identify variants present at 10%, 5%, 2.5%, and 1%, respectively.  相似文献   

17.
As the amount of human genomic sequence available from personal genomes and exomes has increased, so too has the observation of genomic positions having two or more alternative alleles, so‐called multiallelic sites. For portions of the haploid genome that are present in more than one copy, including segmental duplications, variation at such multisite variant positions becomes even more complex. Despite the frequency of multiallelic variants, a number of commonly used resources and tools in genomic research and diagnostics do not support these multiallelic variants all together or require special modifications. Here, we explore the frequency of multiallelic sites in large samples with whole exome sequencing and discuss potential outcomes of failing to account for multiple variant alleles. We also briefly discuss some commonly utilized resources that fully support multiallelic sites.  相似文献   

18.
Synonymous single‐nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA‐/RNA‐binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG‐SN) as a means to discriminate disease‐causing synonymous variants. The model was trained and evaluated on nearly 900 disease‐causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein‐stratified 10‐fold cross‐validation and independent testing, respectively. We were able to show that the disease‐causing effects in the immediate proximity to exon–intron junctions (1–3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4–69 bp). The method is available as a part of the DDIG server at http://sparks-lab.org/ddig .  相似文献   

19.
20.
There is growing interest in quantitative analysis of in vivo genetic toxicity dose‐response data, and use of point‐of‐departure (PoD) metrics such as the benchmark dose (BMD) for human health risk assessment (HHRA). Currently, multiple transgenic rodent (TGR) assay variants, employing different rodent strains and reporter transgenes, are used for the assessment of chemically‐induced genotoxic effects in vivo . However, regulatory issues arise when different PoD values (e.g., lower BMD confidence intervals or BMDLs) are obtained for the same compound across different TGR assay variants. This study therefore employed the BMD approach to examine the ability of different TGR variants to yield comparable genotoxic potency estimates. Review of over 2000 dose‐response datasets identified suitably‐matched dose‐response data for three compounds (ethyl methanesulfonate or EMS, N‐ethyl‐N‐nitrosourea or ENU, and dimethylnitrosamine or DMN) across four commonly‐used murine TGR variants (Muta™Mouse lacZ , Muta™Mouse cII , gpt delta and BigBlue® lacI ). Dose‐response analyses provided no conclusive evidence that TGR variant choice significantly influences the derived genotoxic potency estimate. This conclusion was reliant upon taking into account the importance of comparing BMD confidence intervals as opposed to directly comparing PoD values (e.g., comparing BMDLs). Comparisons with earlier works suggested that with respect to potency determination, tissue choice is potentially more important than choice of TGR assay variant. Scoring multiple tissues selected on the basis of supporting toxicokinetic information is therefore recommended. Finally, we used typical within‐group variances to estimate preliminary endpoint‐specific benchmark response (BMR) values across several TGR variants/tissues. We discuss why such values are required for routine use of genetic toxicity PoDs for HHRA. Environ. Mol. Mutagen. 58:632–643, 2017. © 2017 Her Majesty the Queen in Right of Canada. Environmental and Molecular Mutagenesis Published by Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号