首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 15 毫秒
1.
Several computational methods have been developed for predicting the effects of rapidly expanding variation data. Comparison of the performance of tools has been very difficult as the methods have been trained and tested with different datasets. Until now, unbiased and representative benchmark datasets have been missing. We have developed a benchmark database suite, VariBench, to overcome this problem. VariBench contains datasets of experimentally verified high‐quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three‐dimensional structure), along with identifier mapping to relevant databases. VariBench contains the first benchmark datasets for variation effect analysis, a field which is of high importance and where many developments are currently going on. VariBench datasets can be used, for example, to test performance of prediction tools as well as to train novel machine learning‐based tools. New datasets will be included and the community is encouraged to submit high‐quality datasets to the service. VariBench is freely available at http://structure.bmc.lu.se/VariBench .  相似文献   

2.
The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI‐5 challenge assessing the 38 missense variants affecting the human Pericentriolar material 1 protein (PCM1), our SNAP‐based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI‐5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in nonconserved positions. They were also better able to call effect variants in a structurally rich region than in a less‐structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation—a feature that likely makes it a harder problem for generalized variant effect predictors.  相似文献   

3.
Computational prediction methods are widely used for the analysis of human genome sequence variants and their effects on gene/protein function, splice site aberration, pathogenicity, and disease risk. New methods are frequently developed. We believe that guidelines are essential for those writing articles about new prediction methods, as well as for those applying these tools in their research, so that the necessary details are reported. This will enable readers to gain the full picture of technical information, performance, and interpretation of results, and to facilitate comparisons of related methods. Here, we provide instructions on how to describe new methods, report datasets, and assess the performance of predictive tools. We also discuss what details of predictor implementation are essential for authors to understand. Similarly, these guidelines for the use of predictors provide instructions on what needs to be delineated in the text, as well as how researchers can avoid unwarranted conclusions. They are applicable to most prediction methods currently utilized. By applying these guidelines, authors will help reviewers, editors, and readers to more fully comprehend prediction methods and their use.  相似文献   

4.
Next‐generation sequencing (NGS) has become a powerful and efficient tool for routine mutation screening in clinical research. As each NGS test yields hundreds of variants, the current challenge is to meaningfully interpret the data and select potential candidates. Analyzing each variant while manually investigating several relevant databases to collect specific information is a cumbersome and time‐consuming process, and it requires expertise and familiarity with these databases. Thus, a tool that can seamlessly annotate variants with clinically relevant databases under one common interface would be of great help for variant annotation, cross‐referencing, and visualization. This tool would allow variants to be processed in an automated and high‐throughput manner and facilitate the investigation of variants in several genome browsers. Several analysis tools are available for raw sequencing‐read processing and variant identification, but an automated variant filtering, annotation, cross‐referencing, and visualization tool is still lacking. To fulfill these requirements, we developed DaMold, a Web‐based, user‐friendly tool that can filter and annotate variants and can access and compile information from 37 resources. It is easy to use, provides flexible input options, and accepts variants from NGS and Sanger sequencing as well as hotspots in VCF and BED formats. DaMold is available as an online application at http://damold.platomics.com/index.html , and as a Docker container and virtual machine at https://sourceforge.net/projects/damold/ .  相似文献   

5.
Predicting the impact of mutations on proteins remains an important problem. As part of the CAGI5 frataxin challenge, we evaluate the accuracy with which Provean, FoldX, and ELASPIC can predict changes in the Gibbs free energy of a protein using a limited data set of eight mutations. We find that different methods have distinct strengths and limitations, with no method being strictly superior to other methods on all metrics. ELASPIC achieves the highest accuracy while also providing a web interface which simplifies the evaluation and analysis of mutations. FoldX is slightly less accurate than ELASPIC but is easier to run locally, as it does not depend on external tools or datasets. Provean achieves reasonable results while being computational less expensive than the other methods and not requiring a structure of the protein. In addition to methods submitted to the CAGI5 community experiment, and with the aim to inform about other methods with high accuracy, we also evaluate predictions made by Rosetta's ddg_monomer protocol, Rosetta's cartesian_ddg protocol, and thermodynamic integration calculations using Amber package. ELASPIC still achieves the highest accuracy, while Rosetta's catesian_ddg protocol appears to perform best in capturing the overall trend in the data.  相似文献   

6.
7.
The purpose of the dbNSFP is to provide a one‐stop resource for functional predictions and annotations for human nonsynonymous single‐nucleotide variants (nsSNVs) and splice‐site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome‐sequencing study. A list of all potential nsSNVs and ssSNVs based on the human reference sequence were created and functional predictions and annotations were curated and compiled for each SNV. Here, we report a recent major update of the database to version 3.0. The SNV list has been rebuilt based on GENCODE 22 and currently the database includes 82,832,027 nsSNVs and ssSNVs. An attached database dbscSNV, which compiled all potential human SNVs within splicing consensus regions and their deleteriousness predictions, add another 15,030,459 potentially functional SNVs. Eleven prediction scores (MetaSVM, MetaLR, CADD, VEST3, PROVEAN, 4× fitCons, fathmm‐MKL, and DANN) and allele frequencies from the UK10K cohorts and the Exome Aggregation Consortium (ExAC), among others, have been added. The original seven prediction scores in v2.0 (SIFT, 2× Polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM) as well as many SNV and gene functional annotations have been updated. dbNSFP v3.0 is freely available at http://sites.google.com/site/jpopgen/dbNSFP .  相似文献   

8.
The complete coding region of the norepinephrine transporter (NET) gene was systematically screened for genetic variants in 137 unrelated individuals (including 46 probands with bipolar affective disorder and 45 schizophrenic probands, as well as 46 blood donors) using single-strand conformation analysis. We identified 13 DNA sequence variants, among them five missense substitutions. The missense substitutions Val69Ile, Thr99Ile, Val245Ile, Val449Ile, and Gly478Ser are located at putative transmembrane domains (TMD) 1, 2, 4, 9, and 10, respectively. The Thr99Ile substitution is at the 5th position of the putative leucine-zipper in TMD2. In a case-control study distribution of missense substitutions was found to be similar in 103 patients with bipolar affective disorder, in 228 schizophrenia patients and in 187 controls, indicating that presence of these variants is not causally related to major psychiatric diseases. The detection of a highly polymorphic silent 1287G/A polymorphism was utilized to demonstrate biallelic expression of the NET in adult human brain. © 1996 Wiley-Liss, Inc.  相似文献   

9.
Next‐generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user‐friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high‐risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.  相似文献   

10.
Gaucher's disease (GD) is caused by a β‐glucocerebrosidase deficiency, leading to the accumulation of glucocerebroside in the reticuloendothelial system. The prevalence of GD in Tabuleiro do Norte (TN) (1:4000) is the highest in Brazil. The purpose of this study was to present evidence of consanguinity and founder effect for the G377S mutation (c.1246G>A) among GD patients in TN based on enzyme, molecular and genealogical studies. Between March 2009 and December 2010, 131 subjects at risk for GD (GC in dried blood ≤2.19 nmol/h/ml) and 5 confirmed GD patients from the same community were submitted for molecular analysis to characterize the genetic profile of the population. Based on the enzymatic and molecular analysis, the subjects were classified into three categories: affected (n = 5), carrier (n = 20) and non‐carrier (n = 111). All carriers were (G377S/wt). Affected subjects were homozygous (G377S/G377S). The identification of a single mutation in carriers and homozygotes from different generations, the history of the community and the genealogy study suggest that the high prevalence of GD in this population may be due to a combination of consanguinity and founder effect for the G377S mutation.  相似文献   

11.
Mutations in the basal core promoter (BCP) and precore (PC) regions are associated with persistent and intermittently high hepatitis B virus (HBV) replication in several patients. The variability in the functional domains of BCP and PC region of HBV and their association with disease progression and clinical outcome were assessed in Eastern India, an unique region where three HBV genotypes, A, D, and C are prevalent among the same ethnic group. PCR amplification and direct sequencing of BCP and PC region was done on sera obtained from 130 HBsAg positive subjects with different clinical presentations. Associations of the apparent risk factors with clinical advancement were evaluated by statistical methods including multiple logistic regression analyses (MLR). HBV genotype A was present in 33.08%, C in 25.38%, and D in 41.54% cases. Genotypes A and C were associated with higher rate of T1762/A1764 mutations than the most predominant genotype D. HBeAg negative state was associated with considerably higher rate of C1753 mutation. T1762/A1764 along with C1753 was common among cirrhosis and T1762/A1764 without C1753 was frequent among chronic liver disease cases. No significant association was found between A1896 point mutation and clinical status. Multivariate analysis revealed that T1762/A1764 double mutation, HBV/A, age ≥25 years, C1753 and A1899 were critical factors for clinical advancement while age ≥25 years and C1753 as significant predictor for cirrhosis in comparison with chronic liver disease. In conclusion, the analysis of the BCP variability may help in monitoring the progression towards advanced liver disease in Eastern Indian patients.  相似文献   

12.
The completion of the human genome project at the beginning of the 21st century, along with the rapid advancement of sequencing technologies thereafter, has resulted in exponential growth of biological data. In genetics, this has given rise to numerous variation databases, created to store and annotate the ever‐expanding dataset of known mutations. Usually, these databases focus on variation at the sequence level. Few databases focus on the analysis of variation at the 3D level, that is, mapping, visualizing, and determining the effects of variation in protein structures. Additionally, these Web servers seldom incorporate tools to help analyze these data. Here, we present the Human Mutation Analysis (HUMA) Web server and database. HUMA integrates sequence, structure, variation, and disease data into a single, connected database. A user‐friendly interface provides click‐based data access and visualization, whereas a RESTful Web API provides programmatic access to the data. Tools have been integrated into HUMA to allow initial analyses to be carried out on the server. Furthermore, users can upload their private variation datasets, which are automatically mapped to public data and can be analyzed using the integrated tools. HUMA is freely accessible at https://huma.rubi.ru.ac.za .  相似文献   

13.
To interpret genetic variants discovered from next‐generation sequencing, integration of heterogeneous information is vital for success. This article describes a framework named PERCH (P olymorphism E valuation, R anking, and C lassification for a H eritable trait), available at http://BJFengLab.org/ . It can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare‐variant association test, and a converted variant call quality score. It supports data that contain various combinations of extended pedigrees, trios, and case–controls, and allows for a reduced penetrance, an elevated phenocopy rate, liability classes, and covariates. BayesDel is more accurate than PolyPhen2, SIFT, FATHMM, LRT, Mutation Taster, Mutation Assessor, PhyloP, GERP++, SiPhy, CADD, MetaLR, and MetaSVM. The overall approach is faster and more powerful than the existing quantitative method pVAAST, as shown by the simulations of challenging situations in finding the missing heritability of a complex disease. This framework can also classify variants of unknown significance (variants of uncertain significance) by quantitatively integrating allele frequencies, deleteriousness, association, and co‐segregation. PERCH is a versatile tool for gene prioritization in gene discovery research and variant classification in clinical genetic testing.  相似文献   

14.
Primary immunodeficiency diseases refer to inborn errors of immunity (IEI) that affect the normal development and function of the immune system. The phenotypical and genetic heterogeneity of IEI have made their diagnosis challenging. Hence, whole‐exome sequencing (WES) was employed in this pilot study to identify the genetic etiology of 30 pediatric patients clinically diagnosed with IEI. The potential causative variants identified by WES were validated using Sanger sequencing. Genetic diagnosis was attained in 46.7% (14 of 30) of the patients and categorized into autoinflammatory disorders (n = 3), diseases of immune dysregulation (n = 3), defects in intrinsic and innate immunity (n = 3), predominantly antibody deficiencies (n = 2), combined immunodeficiencies with associated and syndromic features (n = 2) and immunodeficiencies affecting cellular and humoral immunity (n = 1). Of the 15 genetic variants identified, two were novel variants. Genetic findings differed from the provisional clinical diagnoses in seven cases (50.0%). This study showed that WES enhances the capacity to diagnose IEI, allowing more patients to receive appropriate therapy and disease management.  相似文献   

15.
The analysis of genome-wide genetic association studies generally starts with univariate statistical tests of each single-nucleotide polymorphism. The standard approach is the Cochran-Armitage trend test or its logistic regression equivalent although this approach can lose considerable power if the underlying genetic model is not additive. An alternative is the MAX test, which is robust against the three basic modes of inheritance. Here, the asymptotic distribution of the MAX test is derived using the generalized linear model together with the Delta method and multiple contrasts. The approach is applicable to binary, quantitative, and survival traits. It may be used for unrelated individuals, family-based studies, and matched pairs. The approach provides point and interval effect estimates and allows selecting the most plausible genetic model using the minimum P-value. R code is provided. A Monte-Carlo simulation study shows that the asymptotic MAX test framework meets type I error levels well, has good power, and good model selection properties for minor allele frequencies ≥0.3. Pearson''s χ2-test is superior for lower minor allele frequencies with low frequencies for the rare homozygous genotype. In these cases, the model selection procedure should be used with caution. The use of the MAX test is illustrated by reanalyzing findings from seven genome-wide association studies including case–control, matched pairs, and quantitative trait data.  相似文献   

16.
New methods of restriction fragment length polymorphism analysis were developed to distinguish polymorphic variants C190T, G191A, G857A, and 859Del of the NAT2 gene located close to each other. Translated from Byulleten’ Eksperimental’noi Biologii i Meditsiny, Vol. 146, No. 10, pp. 400–403, October, 2008  相似文献   

17.
We were invited to comment on the article by May and Klonsky (2016) titled “What Distinguishes Suicide Attempters From Suicide Ideators? A Meta‐Analysis of Potential Factors.” We were delighted to see the authors calling attention to the fact that the risk factors for onset of suicide ideation differ from those for the transition from suicide ideation to attempt. Our commentary focuses on three points: (a) despite the authors' framing it as such, this is not a new research question, but one with a substantial history; (b) this meta‐analysis excludes most of the available data on this topic and focuses instead on results from small and nonrepresentative studies, limiting the validity of the inferences that can be drawn from this analysis; and (c) this meta‐analysis was designed in a way that precludes the examination of actual risk factors for the transition from suicidal thought to action. We conclude by discussing some important considerations for future research.  相似文献   

18.
19.
As a rule, recombination in bread wheat (Triticum aestivum L.) is low in proximal and high in distal regions of chromosomes. Recombination may be enhanced in proximal regions by using deletion (del) chromosomes deficient for a distal part of a chromosome arm. The chromosome del5BL-11 derived from Chinese Spring (CS) is missing 41% of the distal long arm. This line was made polymorphic by crossing with a stock in which chromosome 5B of CS (5BCS) is substituted for chromosome 5B of T. turgidum ssp. dicoccoides origin (5B T. dic ). Three recombinant del5BL-11 (del5BL-11rec) lines were isolated, all resulting from localized recombination between loci Xbcd926 and XksuH1. In del5BL-11rec, the centromere to fraction length (FL) 0.53 (C-FL0.53) segment is derived from 5B T. dic and the distal region of FL 0.55–0.59 is from 5BCS. Genetic recombination for the C-FL 0.53 interval was assayed in segregating progenies from 5BCS/5B T. dic and del5BL-11/del5BL-11rec crosses using polymorphic markers and for the FL 0.55–0.59 interval in del5BL-11/del5BL-11rec cross from chiasma counts. The pairing data and comparative mapping of normal 5B and del5BL-11 indicated that the increase in recombination was restricted to the FL 0.55–0.59 interval of the del5BL-11 chromosome. No significant increase in recombination in more proximal regions was observed although the order of several markers that cosegregated in the normal 5B map was resolved in the del5BL-11 map. The presented data show that recombination in proximal, usually low-recombination, regions can be increased by placing them close to the chromosome end.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号