首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most common hereditary diseases in humans are complex and multifactorial. Large‐scale genome‐wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next‐generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost‐effective protocols for using next‐generation sequencing in association mapping studies based on pooled and un‐pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon‐capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next‐generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next‐generation sequencing. Genet. Epidemiol. 34: 479–491, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

2.
Family‐based designs enriched with affected subjects and disease associated variants can increase statistical power for identifying functional rare variants. However, few rare variant analysis approaches are available for time‐to‐event traits in family designs and none of them applicable to the X chromosome. We developed novel pedigree‐based burden and kernel association tests for time‐to‐event outcomes with right censoring for pedigree data, referred to FamRATS (family‐based rare variant association tests for survival traits). Cox proportional hazard models were employed to relate a time‐to‐event trait with rare variants with flexibility to encompass all ranges and collapsing of multiple variants. In addition, the robustness of violating proportional hazard assumptions was investigated for the proposed and four current existing tests, including the conventional population‐based Cox proportional model and the burden, kernel, and sum of squares statistic (SSQ) tests for family data. The proposed tests can be applied to large‐scale whole‐genome sequencing data. They are appropriate for the practical use under a wide range of misspecified Cox models, as well as for population‐based, pedigree‐based, or hybrid designs. In our extensive simulation study and data example, we showed that the proposed kernel test is the most powerful and robust choice among the proposed burden test and the existing four rare variant survival association tests. When applied to the Diabetes Heart Study, the proposed tests found exome variants of the JAK1 gene on chromosome 1 showed the most significant association with age at onset of type 2 diabetes from the exome‐wide analysis.  相似文献   

3.
Functional variants change the protein product or the expression of genes. Due to the latest advances in sequencing technology, most known functional variants can now be assayed in a cost‐effective manner. However, to fully use the information from functional variants, researchers need to model the joint effect of these variants. In this article, we propose methods that model the action/interaction of loss‐of‐function (LOF) mutations, i.e., those mutations that eliminate the protein product of a gene. When multiple LOFs occur in the same causal gene/region, their effect on a phenotype might depend on whether these mutations lie on the same DNA strand/haplotype. When compared to LOFs occurring on the same strand, if these mutations lie on different strands, both copies of the gene are impaired and the impact on the relevant phenotypes is likely to be more severe. To use the information from LOF strand colocalization, we propose three methods that utilize the information from the estimated number of affected strands. We compare the performance of the proposed and competing methods by using simulations of common and rare LOF variants. Two of the proposed methods exhibited desirable power profiles, the first for both common and rare LOFs and the second only for common LOFs. One of the existing methods, collapsed double heterozygosity, exhibits good power to detect compound models for rare variants, especially when no haplotype harbors two or more rare alleles. Consequently, we recommend these three methods to be used for the analysis of functional variants coming from sequencing studies.  相似文献   

4.
Imputation is widely used for obtaining information about rare variants. However, one issue concerning imputation is the low accuracy of imputed rare variants as the inaccurate imputed rare variants may distort the results of region‐based association tests. Therefore, we developed a pre‐collapsing imputation method (PreCimp) to improve the accuracy of imputation by using collapsed variables. Briefly, collapsed variables are generated using rare variants in the reference panel, and a new reference panel is constructed by inserting pre‐collapsed variables into the original reference panel. Following imputation analysis provides the imputed genotypes of the collapsed variables. We demonstrated the performance of PreCimp on 5,349 genotyped samples using a Korean population specific reference panel including 848 samples of exome sequencing, Affymetrix 5.0, and exome chip. PreCimp outperformed a traditional post‐collapsing method that collapses imputed variants after single rare variant imputation analysis. Compared with the results of post‐collapsing method, PreCimp approach was shown to relatively increase imputation accuracy about 3.4–6.3% when dosage r2 is between 0.6 and 0.8, 10.9–16.1% when dosage r2 is between 0.4 and 0.6, and 21.4 ~ 129.4% when dosage r2 is below 0.4.  相似文献   

5.
Large genome‐wide association studies (GWAS) have been performed to detect common genetic variants involved in common diseases, but most of the variants found this way account for only a small portion of the trait variance. Furthermore, candidate gene‐based resequencing suggests that many rare genetic variants contribute to the trait variance of common diseases. Here we propose two designs, sibpair and unrelated‐case designs, to detect rare genetic variants in either a candidate gene‐based or genome‐wide association analysis. First we show that we can detect and classify together rare risk haplotypes using a relatively small sample with either of these designs, and then have increased power to test association in a larger case‐control sample. This method can also be applied to resequencing data. Next we apply the method to the Wellcome Trust Case Control Consortium (WTCCC) coronary artery disease (CAD) and hypertension (HT) data, the latter being the only trait for which no genome‐wide association evidence was reported in the original WTCCC study, and identify one interesting gene associated with HT and four associated with CAD at a genome‐wide significance level of 5%. These results suggest that searching for rare genetic variants is feasible and can be fruitful in current GWAS, candidate gene studies or resequencing studies. Genet. Epidemiol. 34: 171–187, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

6.
Recent sequencing efforts have focused on exploring the influence of rare variants on the complex diseases. Gene level based tests by aggregating information across rare variants within a gene have become attractive to enrich the rare variant association signal. Among them, the sequence kernel association test (SKAT) has proved to be a very powerful method for jointly testing multiple rare variants within a gene. In this article, we explore an alternative SKAT. We propose to use the univariate likelihood ratio statistics from the marginal model for individual variants as input into the kernel association test. We show how to compute its significance P‐value efficiently based on the asymptotic chi‐square mixture distribution. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to associations between rare exonic variants and type 2 diabetes (T2D) in the Atherosclerosis Risk in Communities (ARIC) study. We identified an exome‐wide significant rare variant set in the gene ZZZ3 worthy of further investigations.  相似文献   

7.
Next generation sequencing technology has enabled the paradigm shift in genetic association studies from the common disease/common variant to common disease/rare‐variant hypothesis. Analyzing individual rare variants is known to be underpowered; therefore association methods have been developed that aggregate variants across a genetic region, which for exome sequencing is usually a gene. The foreseeable widespread use of whole genome sequencing poses new challenges in statistical analysis. It calls for new rare‐variant association methods that are statistically powerful, robust against high levels of noise due to inclusion of noncausal variants, and yet computationally efficient. We propose a simple and powerful statistic that combines the disease‐associated P‐values of individual variants using a weight that is the inverse of the expected standard deviation of the allele frequencies under the null. This approach, dubbed as Sigma‐P method, is extremely robust to the inclusion of a high proportion of noncausal variants and is also powerful when both detrimental and protective variants are present within a genetic region. The performance of the Sigma‐P method was tested using simulated data based on realistic population demographic and disease models and its power was compared to several previously published methods. The results demonstrate that this method generally outperforms other rare‐variant association methods over a wide range of models. Additionally, sequence data on the ANGPTL family of genes from the Dallas Heart Study were tested for associations with nine metabolic traits and both known and novel putative associations were uncovered using the Sigma‐P method.  相似文献   

8.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

9.
For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta‐analysis has emerged as the method of choice to combine results from multiple studies. Many meta‐analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta‐analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two‐stage meta‐analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta‐analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype‐specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type‐I error rate, and our approach is more powerful than inverse variance weighted meta‐analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates.  相似文献   

10.
A combination of common and rare variants is thought to contribute to genetic susceptibility to complex diseases. Recently, next‐generation sequencers have greatly lowered sequencing costs, providing an opportunity to identify rare disease variants in large genetic epidemiology studies. At present, it is still expensive and time consuming to resequence large number of individual genomes. However, given that next‐generation sequencing technology can provide accurate estimates of allele frequencies from pooled DNA samples, it is possible to detect associations of rare variants using pooled DNA sequencing. Current statistical approaches to the analysis of associations with rare variants are not designed for use with pooled next‐generation sequencing data. Hence, they may not be optimal in terms of both validity and power. Therefore, we propose here a new statistical procedure to analyze the output of pooled sequencing data. The test statistic can be computed rapidly, making it feasible to test the association of a large number of variants with disease. By simulation, we compare this approach to Fisher's exact test based either on pooled or individual genotypic data. Our results demonstrate that the proposed method provides good control of the Type I error rate, while yielding substantially higher power than Fisher's exact test using pooled genotypic data for testing rare variants, and has similar or higher power than that of Fisher's exact test using individual genotypic data. Our results also provide guidelines on how various parameters of the pooled sequencing design affect the efficiency of detecting associations. Genet. Epidemiol. 34: 492–501, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

11.
Since the development of next generation sequencing (NGS) technology, researchers have been extending their efforts on genome‐wide association studies (GWAS) from common variants to rare variants to find the missing inheritance. Although various statistical methods have been proposed to analyze rare variants data, they generally face difficulties for complex disease models involving multiple genes. In this paper, we propose a tree‐based analysis of rare variants (TARV) that adopts a nonparametric disease model and is capable of exploring gene–gene interactions. We found that TARV outperforms the sequence kernel association test (SKAT) in most of our simulation scenarios, and by notable margins in some cases. By applying TARV to the study of addiction: genetics and environment (SAGE) data, we successfully detected gene CTNNA2 and its 43 specific variants that increase the risk of alcoholism in women, with an odds ratio (OR) of 1.94. This gene has not been detected in the SAGE data. Post hoc literature search also supports the role of CTNNA2 as a likely risk gene for alcohol addiction. In addition, we also detected a plausible protective gene CNTNAP2, whose 97 rare variants can reduce the risk of alcoholism in women, with an OR of 0.55. These findings suggest that TARV can be effective in dissecting genetic variants for complex diseases using rare variants data.  相似文献   

12.
Advancement in sequencing technology enables the study of association between complex disorder phenotypes and single‐nucleotide polymorphisms with rare mutations. However, the rare genetic variant has extremely small variance and impairs testing power of traditional statistical methods. We introduce a W‐test collapsing method to evaluate rare‐variant association by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. The method is model‐free and inherits chi‐squared distribution with degrees of freedom estimated from bootstrapped samples of the data, and allows for fast and accurate P‐value calculation without the need of permutations. The proposed method is compared with the Weighted‐Sum Statistic and Sequence Kernel Association Test on simulation datasets, and showed good performances and significantly faster computing speed. In the application of real next‐generation sequencing dataset of hypertensive disorder, it identified genes of interesting biological functions associated to metabolism disorder and inflammation, including the MACROD1, NLRP7, AGK, PAK6, and APBB1. The proposed method offers an efficient and effective way for testing rare genetic variants in whole exome sequencing datasets.  相似文献   

13.
Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large‐scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under‐power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease‐haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension‐reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high‐dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

14.
Whole‐exome sequencing using family data has identified rare coding variants in Mendelian diseases or complex diseases with Mendelian subtypes, using filters based on variant novelty, functionality, and segregation with the phenotype within families. However, formal statistical approaches are limited. We propose a gene‐based segregation test (GESE) that quantifies the uncertainty of the filtering approach. It is constructed using the probability of segregation events under the null hypothesis of Mendelian transmission. This test takes into account different degrees of relatedness in families, the number of functional rare variants in the gene, and their minor allele frequencies in the corresponding population. In addition, a weighted version of this test allows incorporating additional subject phenotypes to improve statistical power. We show via simulations that the GESE and weighted GESE tests maintain appropriate type I error rate, and have greater power than several commonly used region‐based methods. We apply our method to whole‐exome sequencing data from 49 extended pedigrees with severe, early‐onset chronic obstructive pulmonary disease (COPD) in the Boston Early‐Onset COPD study (BEOCOPD) and identify several promising candidate genes. Our proposed methods show great potential for identifying rare coding variants of large effect and high penetrance for family‐based sequencing data. The proposed tests are implemented in an R package that is available on CRAN ( https://cran.r-project.org/web/packages/GESE/ ).  相似文献   

15.
Genetic studies often collect multiple correlated traits, which could be analyzed jointly to increase power by aggregating multiple weak effects and provide additional insights into the etiology of complex human diseases. Existing methods for multiple trait association tests have primarily focused on common variants. There is a surprising dearth of published methods for testing the association of rare variants with multiple correlated traits. In this paper, we extend the commonly used sequence kernel association test (SKAT) for single‐trait analysis to test for the joint association of rare variant sets with multiple traits. We investigate the performance of the proposed method through extensive simulation studies. We further illustrate its usefulness with application to the analysis of diabetes‐related traits in the Atherosclerosis Risk in Communities (ARIC) Study. We identified an exome‐wide significant rare variant set in the gene YAP1 worthy of further investigations.  相似文献   

16.
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.  相似文献   

17.
Haplotype‐based association studies have been proposed as a powerful comprehensive approach to identify causal genetic variation underlying complex diseases. Data comparisons within families offer the additional advantage of dealing naturally with complex sources of noise, confounding and population stratification. Two problems encountered when investigating associations between haplotypes and a continuous trait using data from sibships are (i) the need to define within‐sibship comparisons for sibships of size greater than two and (ii) the difficulty of resolving the joint distribution of haplotype pairs within sibships in the absence of parental genotypes. We therefore propose first a method of orthogonal transformation of both outcomes and exposures that allow the decomposition of between‐ and within‐sibship regression effects when sibship size is greater than two. We conducted a simulation study, which confirmed analysis using all members of a sibship is statistically more powerful than methods based on cross‐sectional analysis or using subsets of sib‐pairs. Second, we propose a simple permutation approach to avoid errors of inference due to the within‐sibship correlation of any errors in haplotype assignment. These methods were applied to investigate the association between mammographic density (MD), a continuously distributed and heritable risk factor for breast cancer, and single nucleotide polymorphisms (SNPs) and haplotypes from the VDR gene using data from a study of 430 twins and sisters. We found evidence of association between MD and a 4‐SNP VDR haplotype. In conclusion, our proposed method retains the benefits of the between‐ and within‐pair analysis for pairs of siblings and can be implemented in standard software. Genet. Epidemiol. 34: 309–318, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene‐based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one‐at‐a‐time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model‐based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare‐variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within‐subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi‐Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.  相似文献   

19.
There is an emerging interest in sequencing‐based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance‐component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM, is derived from a random‐effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non‐causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non‐causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

20.
Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that “aggregate” tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare‐variant test that explicitly models a fraction of variants as neutral, tests associations at the gene‐level, and infers the rare‐variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome‐wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare‐variants imputed from the National Heart, Lung, and Blood Institute's Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号