首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Genome‐wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large‐scale re‐sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population‐based genetic studies when data are available from re‐sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re‐sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits. Genet. Epidemiol. 34: 188–193, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
The breakthroughs in next generation sequencing have allowed us to access data consisting of both common and rare variants, and in particular to investigate the impact of rare genetic variation on complex diseases. Although rare genetic variants are thought to be important components in explaining genetic mechanisms of many diseases, discovering these variants remains challenging, and most studies are restricted to population‐based designs. Further, despite the shift in the field of genome‐wide association studies (GWAS) towards studying rare variants due to the “missing heritability” phenomenon, little is known about rare X‐linked variants associated with complex diseases. For instance, there is evidence that X‐linked genes are highly involved in brain development and cognition when compared with autosomal genes; however, like most GWAS for other complex traits, previous GWAS for mental diseases have provided poor resources to deal with identification of rare variant associations on X‐chromosome. In this paper, we address the two issues described above by proposing a method that can be used to test X‐linked variants using sequencing data on families. Our method is much more general than existing methods, as it can be applied to detect both common and rare variants, and is applicable to autosomes as well. Our simulation study shows that the method is efficient, and exhibits good operational characteristics. An application to the University of Miami Study on Genetics of Autism and Related Disorders also yielded encouraging results.  相似文献   

3.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

4.
Next generation sequencing technology has enabled the paradigm shift in genetic association studies from the common disease/common variant to common disease/rare‐variant hypothesis. Analyzing individual rare variants is known to be underpowered; therefore association methods have been developed that aggregate variants across a genetic region, which for exome sequencing is usually a gene. The foreseeable widespread use of whole genome sequencing poses new challenges in statistical analysis. It calls for new rare‐variant association methods that are statistically powerful, robust against high levels of noise due to inclusion of noncausal variants, and yet computationally efficient. We propose a simple and powerful statistic that combines the disease‐associated P‐values of individual variants using a weight that is the inverse of the expected standard deviation of the allele frequencies under the null. This approach, dubbed as Sigma‐P method, is extremely robust to the inclusion of a high proportion of noncausal variants and is also powerful when both detrimental and protective variants are present within a genetic region. The performance of the Sigma‐P method was tested using simulated data based on realistic population demographic and disease models and its power was compared to several previously published methods. The results demonstrate that this method generally outperforms other rare‐variant association methods over a wide range of models. Additionally, sequence data on the ANGPTL family of genes from the Dallas Heart Study were tested for associations with nine metabolic traits and both known and novel putative associations were uncovered using the Sigma‐P method.  相似文献   

5.
Although genome‐wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiple correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multivariant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used sequence kernel association test (SKAT) for a single phenotype. We applied MAAUSS to whole exome sequencing (WES) data from a Korean population of 1,058 subjects to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability.  相似文献   

6.
Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that “aggregate” tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare‐variant test that explicitly models a fraction of variants as neutral, tests associations at the gene‐level, and infers the rare‐variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome‐wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare‐variants imputed from the National Heart, Lung, and Blood Institute's Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.  相似文献   

7.
Large genome‐wide association studies (GWAS) have been performed to detect common genetic variants involved in common diseases, but most of the variants found this way account for only a small portion of the trait variance. Furthermore, candidate gene‐based resequencing suggests that many rare genetic variants contribute to the trait variance of common diseases. Here we propose two designs, sibpair and unrelated‐case designs, to detect rare genetic variants in either a candidate gene‐based or genome‐wide association analysis. First we show that we can detect and classify together rare risk haplotypes using a relatively small sample with either of these designs, and then have increased power to test association in a larger case‐control sample. This method can also be applied to resequencing data. Next we apply the method to the Wellcome Trust Case Control Consortium (WTCCC) coronary artery disease (CAD) and hypertension (HT) data, the latter being the only trait for which no genome‐wide association evidence was reported in the original WTCCC study, and identify one interesting gene associated with HT and four associated with CAD at a genome‐wide significance level of 5%. These results suggest that searching for rare genetic variants is feasible and can be fruitful in current GWAS, candidate gene studies or resequencing studies. Genet. Epidemiol. 34: 171–187, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
Many longitudinal cohort studies have both genome‐wide measures of genetic variation and repeated measures of phenotypes and environmental exposures. Genome‐wide association study analyses have typically used only cross‐sectional data to evaluate quantitative phenotypes and binary traits. Incorporation of repeated measures may increase power to detect associations, but also requires specialized analysis methods. Here, we discuss one such method—generalized estimating equations (GEE)—in the contexts of analysis of main effects of rare genetic variants and analysis of gene‐environment interactions. We illustrate the potential for increased power using GEE analyses instead of cross‐sectional analyses. We also address challenges that arise, such as the need for small‐sample corrections when the minor allele frequency of a genetic variant and/or the prevalence of an environmental exposure is low. To illustrate methods for detection of gene‐drug interactions on a genome‐wide scale, using repeated measures data, we conduct single‐study analyses and meta‐analyses across studies in three large cohort studies participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium—the Atherosclerosis Risk in Communities study, the Cardiovascular Health Study, and the Rotterdam Study. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
Genome‐wide association studies (GWAS) have been successful in identifying common variants related to complex disorders. However, some disorders have proved resistant to this strategy with few associations confirmed, despite evidence from twin and family studies of a genetic component. Sophisticated strategies that account for phenotypic heterogeneity may be required to uncover these genetic contributions. Age at onset is an example of a potential source of this heterogeneity in ischaemic stroke. We explore the contribution of age at onset in the Wellcome Trust Case‐Control Consortium 2 ischaemic stroke study. We first examine four established stroke loci in younger onset cases. We extend this to all single‐nucleotide polymorphisms (SNPs) genome‐wide, testing for stronger association signals in younger subsets of cases. Finally, we estimate the pseudoheritability accounted for by common SNPs present on genome‐wide genotyping arrays for cases stratified by age at onset. We find evidence for stronger associations in younger onset cases for the four established stroke loci. Genome‐wide, in cardioembolic and small vessel stroke subphenotypes, a significant number of SNPs show stronger association P‐values when the oldest cases are removed. Finally, we show that the pseudoheritability estimated by common SNPs in cardioembolic stroke increased from 16.5% for older onset cases to 28.5% for younger onset cases. Our results indicate that age at onset is a valuable measure for case ascertainment and in analysis of GWAS in ischaemic stroke: focussing on younger cases who may have a stronger genetic predisposition increases power to detect associations.  相似文献   

10.
Genetic studies often collect multiple correlated traits, which could be analyzed jointly to increase power by aggregating multiple weak effects and provide additional insights into the etiology of complex human diseases. Existing methods for multiple trait association tests have primarily focused on common variants. There is a surprising dearth of published methods for testing the association of rare variants with multiple correlated traits. In this paper, we extend the commonly used sequence kernel association test (SKAT) for single‐trait analysis to test for the joint association of rare variant sets with multiple traits. We investigate the performance of the proposed method through extensive simulation studies. We further illustrate its usefulness with application to the analysis of diabetes‐related traits in the Atherosclerosis Risk in Communities (ARIC) Study. We identified an exome‐wide significant rare variant set in the gene YAP1 worthy of further investigations.  相似文献   

11.
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene‐based tests have insufficient power even for moderately sized samples, pathway‐based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome‐wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region‐based rare variant association methods, although potentially applicable to pathway‐based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome‐based studies, we use simulated and real datasets to evaluate pathway‐based association tests. Our simulation strategy adopts a genome‐wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway‐based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway‐based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single‐marker tests without gene‐level collapsing (weighted Kolmogrov‐Smirnov [WKS]‐Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome‐sequencing data of the chronic obstructive pulmonary disease, and found that the WKS‐Variant method confirms associated genes previously published.  相似文献   

12.
With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population‐based methods for unrelated individuals. A limitation of population‐based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population‐based rare variant associations. A solution to the problem of population stratification is to use family‐based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW‐PAC). TOW‐PAC is a family‐based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family‐based association tests are robust to population stratification although population‐based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW‐PAC increases with an increase of the number of affected children in each family and TOW‐PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.  相似文献   

13.
In association studies of complex traits, fixed‐effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance‐component tests based on mixed models were developed for region‐based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT‐O), and a combined sum test of rare and common variant effect (SKAT‐C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT‐O, and SKAT‐C, (ii) traditional fixed‐effect additive models, and (iii) fixed‐effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed‐effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed‐effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT‐O/SKAT‐C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed‐effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.  相似文献   

14.
A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.  相似文献   

15.
Genome‐wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large‐scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest‐effect genes by making genome‐wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as “genetically matched controls” for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false‐positive error rate in the presence of population structure. As a remedy, we make use of genome‐wide data and model selection techniques to identify “axes” of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study. Genet. Epidemiol. 34: 319–326, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

16.
Most common hereditary diseases in humans are complex and multifactorial. Large‐scale genome‐wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next‐generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost‐effective protocols for using next‐generation sequencing in association mapping studies based on pooled and un‐pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon‐capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next‐generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next‐generation sequencing. Genet. Epidemiol. 34: 479–491, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

17.
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next‐generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross‐phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare‐variant approaches exist for testing cross‐phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross‐phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome‐wide scale due to the use of a closed‐form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.  相似文献   

18.
Genome‐wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype‐phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping‐based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome‐wide significance thresholds for various analysis scenarios. Using whole‐genome sequence data, we simulated sequencing‐based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome‐wide significance at approximately P = 5 × 10?9, and studies of African samples should apply a more stringent genome‐wide significance threshold of P = 1 × 10?9. Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.  相似文献   

19.
We are interested in investigating the involvement of multiple rare variants within a given region by conducting analyses of individual regions with two goals: (1) to determine if regional rare variation in aggregate is associated with risk; and (2) conditional upon the region being associated, to identify specific genetic variants within the region that are driving the association. In particular, we seek a formal integrated analysis that achieves both of our goals. For rare variants with low minor allele frequencies, there is very little power to statistically test the null hypothesis of equal allele or genotype counts for each variant. Thus, genetic association studies are often limited to detecting association within a subset of the common genetic markers. However, it is very likely that associations exist for the rare variants that may not be captured by the set of common markers. Our framework aims at constructing a risk index based on multiple rare variants within a region. Our analytical strategy is novel in that we use a Bayesian approach to incorporate model uncertainty in the selection of variants to include in the index as well as the direction of the associated effects. Additionally, the approach allows for inference at both the group and variant-specific levels. Using a set of simulations, we show that our methodology has added power over other popular rare variant methods to detect global associations. In addition, we apply the approach to sequence data from the WECARE Study of second primary breast cancers.  相似文献   

20.
It is 100 years since R. A. Fisher proposed that a Mendelian model of genetic variant effects, additive over loci, could explain the patterns of observed phenotypic correlations between relatives. His loci were hypothetical and his model theoretical. It is only about 50 years since the first genetic markers allowed the detection of even variants with major effects on phenotype, and only 20 years since the development of single-nucleotide polymorphism technology provided dense markers over the genome. Then both mappings in defined pedigrees and population-based genome-wide association studies samples allowed the detection of multiple contributing variants of smaller effect. Finally, with methods based on genotypic correlations between individuals, or on allelic associations between loci, the additive heritability contributions of the genome can be estimated from large population samples. In this review we trace, from 1918 to 2018, the analysis of observed phenotypic correlations between relatives to estimate underlying genetic components of traits in human populations. As with studies from 1918 onward, we use height as the example trait where not only data are readily available, but where Fisher's model of large numbers of variants of infinitesimal effect appears to provide a good approximation to reality. However, we also trace the use of phenotypic and genotypic correlations between relatives in mapping causal variants and resolving genetic contributions to more complex human traits. With the availability of DNA sequence data, we can hope to not only estimate the total genetic contribution to a trait, but to resolve effects of individual genetic variants on biological function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号