首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Optimal designs for two-stage genome-wide association studies   总被引:3,自引:0,他引:3  
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in stage 2. We describe a strategy for designing cost-effective two-stage GWA studies. Our strategy preserves much of the power of the corresponding one-stage design and minimizes the genotyping cost of the study while allowing for differences in per genotyping cost between stages 1 and 2. We show that the ratio of stage 2 to stage 1 per genotype cost can strongly influence both the optimal design and the genotyping cost of the study. Increasing the stage 2 per genotype cost shifts more of the genotyping and study cost to stage 1, and increases the cost of the study. This higher cost can be partially mitigated by adopting a design with reduced power while preserving the false positive rate or by increasing the false positive rate while preserving power. For example, reducing the power preserved in the two-stage design from 99 to 95% that of the one-stage design decreases the two-stage study cost by approximately 15%. Alternatively, the same cost savings can be had by relaxing the false positive rate by 2.5-fold, for example from 1/300,000 to 2.5/300,000, while retaining the same power.  相似文献   

2.
With the establishment of large consortiums of researchers, genome-wide association (GWA) studies have become increasingly popular and feasible. Although most of these association studies focus on unrelated individuals, a lot of advantages can be exploited by including families in the analysis as well. To overcome the additional genotyping cost, multi-stage designs are particularly useful. In this article, I offer a perspective view on genome-wide family-based association analyses, both within a model-based and model-free paradigm. I highlight how multi-stage designs and analysis techniques, which are quite popular in clinical epidemiology, can enter GWA settings. I furthermore discuss how they have proven successful in reducing analysis complexity, and in overcoming one of the most cumbersome statistical hurdles in the genome-wide context, namely controlling increased false positives due to multiple testing.  相似文献   

3.
The interpretation of the results of large association studies encompassing much or all of the human genome faces the fundamental statistical problem that a correspondingly large number of single nucleotide polymorphisms markers will be spuriously flagged as significant. A common method of dealing with these false positives is to raise the significance level for the individual tests for association of each marker. Any such adjustment for multiple testing is ultimately based on a more or less precise estimate for the actual overall type I error probability. We estimate this probability for association tests for correlated markers and show that it depends in a nonlinear way on the significance level for the individual tests. This dependence of the effective number of tests is not taken into account by existing multiple-testing corrections, leading to widely overestimated results. We demonstrate a simple correction for multiple testing, which can easily be calculated from the pairwise correlation and gives far more realistic estimates for the effective number of tests than previous formulae. The calculation is considerably faster than with other methods and hence applicable on a genome-wide scale. The efficacy of our method is shown on a constructed example with highly correlated markers as well as on real data sets, including a full genome scan where a conservative estimate only 8% above the permutation estimate is obtained in about 1% of computation time. As the calculation is based on pairwise correlations between markers, it can be performed at the stage of study design using public databases.  相似文献   

4.
Li C  Li M  Long JR  Cai Q  Zheng W 《Genetic epidemiology》2008,32(5):387-395
Genome-wide association (GWA) studies have recently emerged as a major approach to gene discovery for many complex diseases. Since GWA scans are expensive, cost efficiency is an important factor to consider in study design. However, it often requires extensive and time-consuming computer simulations to compare cost efficiency across different single nucleotide polymorphism (SNP) chips. Here, we propose two simulation-free approaches to cost efficiency comparisons across SNP chips. In the first method, the overall power under a given disease model is calculated for each SNP chip and various sample sizes. Then SNP chips can be compared with respect to the sample sizes required to achieve the same level of power. In the second method, for a desired level of genomic coverage, the effective r(2) threshold values are calculated for each SNP chip. Since r(2) is inversely proportional to the sample size to achieve the same power, the required sample sizes can then be compared among SNP chips. These two methods are complementary to each other. The first approach provides direct power comparisons, but it requires information on disease model and may not be reliable for SNP chips that contain many non-HapMap SNPs. The second approach allows sample size comparisons based on the coverage of SNP chips, and it can be modified for SNP chips that contain non-HapMap SNPs. These methods are particularly relevant for large epidemiological studies in which enough subjects are available for GWA screening and follow-up stages. We illustrate these approaches using five currently available whole genome SNP chips.  相似文献   

5.
Though multiple interacting loci are likely involved in the etiology of complex diseases, early genome-wide association studies (GWAS) have depended on the detection of the marginal effects of each locus. Here, we evaluate the power of GWAS in the presence of two linked and potentially associated causal loci for several models of interaction between them and find that interacting loci may give rise to marginal relative risks that are not generally considered in a one-locus model. To derive power under realistic situations, we use empirical data generated by the HapMap ENCODE project for both allele frequencies and linkage disequilibrium (LD) structure. The power is also evaluated in situations where the causal single nucleotide polymorphisms (SNPs) may not be genotyped, but rather detected by proxy using a SNP in LD. A common simplification for such power computations assumes that the sample size necessary to detect the effect at the tSNP is the sample size necessary to detect the causal locus directly divided by the LD measure r(2) between the two. This assumption, which we call the "proportionality assumption", is a simplification of the many factors that contribute to the strength of association at a marker, and has recently been criticized as unreasonable (Terwilliger and Hiekkalinna [2006] Eur J Hum Genet 14(4):426-437), in particular in the presence of interacting and associated loci. We find that this assumption does not introduce much error in single locus models of disease, but may do so in so in certain two-locus models.  相似文献   

6.
Genome-wide association studies are carried out to identify unknown genes for a complex trait. Polymorphisms showing the most statistically significant associations are reported and followed up in subsequent confirmatory studies. In addition to the test of association, the statistical analysis provides point estimates of the relationship between the genotype and phenotype at each polymorphism, typically an odds ratio in case-control association studies. The statistical significance of the test and the estimator of the odds ratio are completely correlated. Selecting the most extreme statistics is equivalent to selecting the most extreme odds ratios. The value of the estimator, given the value of the statistical significance depends on the standard error of the estimator and the power of the study. This report shows that when power is low, estimates of the odds ratio from a genome-wide association study, or any large-scale association study, will be upwardly biased. Genome-wide association studies are often underpowered given the low alpha levels required to declare statistical significance and the small individual genetic effects known to characterize complex traits. Factors such as low allele frequency, inadequate sample size and weak genetic effects contribute to large standard errors in the odds ratio estimates, low power and upwardly biased odds ratios. Studies that have high power to detect an association with the true odds ratio will have little or no bias, regardless of the statistical significance threshold. The results have implications for the interpretation of genome-wide association analysis and the planning of subsequent confirmatory stages.  相似文献   

7.
Despite the success of genome-wide association studies, much of the genetic contribution to complex human traits is still unexplained. One potential source of genetic variation that may contribute to this "missing heritability" is that which differs in magnitude and/or direction between males and females, which could result from sexual dimorphism in gene expression. Such sex-differentiated effects are common in model organisms, and are becoming increasingly evident in human complex traits through large-scale male- and female-specific meta-analyses. In this article, we review the methodology for meta-analysis of sex-specific genome-wide association studies, and propose a sex-differentiated test of association with quantitative or dichotomous traits, which allows for heterogeneity of allelic effects between males and females. We perform detailed simulations to compare the power of the proposed sex-differentiated meta-analysis with the more traditional "sex-combined" approach, which is ambivalent to gender. The results of this study highlight only a small loss in power for the sex-differentiated meta-analysis when the allelic effects of the causal variant are the same in males and females. However, over a range of models of heterogeneity in allelic effects between genders, our sex-differentiated meta-analysis strategy offers substantial gains in power, and thus has the potential to discover novel loci contributing effects to complex human traits with existing genome-wide association data.  相似文献   

8.
Genome-wide association (GWA) studies have been extremely successful in identifying novel loci contributing effects to a wide range of complex human traits. However, despite this success, the joint marginal effects of these loci account for only a small proportion of the heritability of these traits. Interactions between variants in different loci are not typically modelled in traditional GWA analysis, but may account for some of the missing heritability in humans, as they do in other model organisms. One of the key challenges in performing gene-gene interaction studies is the computational burden of the analysis. We propose a two-stage interaction analysis strategy to address this challenge in the context of both quantitative traits and dichotomous phenotypes. We have performed simulations to demonstrate only a negligible loss in power of this two-stage strategy, while minimizing the computational burden. Application of this interaction strategy to GWA studies of T2D and obesity highlights potential novel signals of association, which warrant follow-up in larger cohorts.  相似文献   

9.
Qin H  Zhu X 《Genetic epidemiology》2012,36(3):235-243
When dense markers are available, one can interrogate almost every common variant across the genome via imputation and single nucleotide polymorphism (SNP) test, which has become a routine in current genome-wide association studies (GWASs). As a complement, admixture mapping exploits the long-range linkage disequilibrium (LD) generated by admixture between genetically distinct ancestral populations. It is then questionable whether admixture mapping analysis is still necessary in detecting the disease associated variants in admixed populations. We argue that admixture mapping is able to reduce the burden of massive comparisons in GWASs; it therefore can be a powerful tool to locate the disease variants with substantial allele frequency differences between ancestral populations. In this report we studied a two-stage approach, where candidate regions are defined by conducting admixture mapping at stage 1, and single SNP association tests are followed at stage 2 within the candidate regions defined at stage 1. We first established the genome-wide significance levels corresponding to the criteria to define the candidate regions at stage 1 by simulations. We next compared the power of the two-stage approach with direct association analysis. Our simulations suggest that the two-stage approach can be more powerful than the standard genome-wide association analysis when the allele frequency difference of a causal variant in ancestral populations, is larger than 0.4. Our conclusion is consistent with a theoretical prediction by Risch and Tang ([2006] Am J Hum Genet 79:S254). Surprisingly, our study also suggests that power can be improved when we use less strict criteria to define the candidate regions at stage 1.  相似文献   

10.
Improving power in genome-wide association studies: weights tip the scale   总被引:3,自引:0,他引:3  
The potential of genome-wide association analysis can only be realized when they have power to detect signals despite the detrimental effect of multiple testing on power. We develop a weighted multiple testing procedure that facilitates the input of prior information in the form of groupings of tests. For each group a weight is estimated from the observed test statistics within the group. Differentially weighting groups improves the power to detect signals in likely groupings. The advantage of the grouped-weighting concept, over fixed weights based on prior information, is that it often leads to an increase in power even if many of the groupings are not correlated with the signal. Being data dependent, the procedure is remarkably robust to poor choices in groupings. Power is typically improved if one (or more) of the groups clusters multiple tests with signals, yet little power is lost when the groupings are totally random. If there is no apparent signal in a group, relative to a group that appears to have several tests with signals, the former group will be down-weighted relative to the latter. If no groups show apparent signals, then the weights will be approximately equal. The only restriction on the procedure is that the number of groups be small, relative to the total number of tests performed.  相似文献   

11.
Genome‐wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large‐scale re‐sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population‐based genetic studies when data are available from re‐sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re‐sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits. Genet. Epidemiol. 34: 188–193, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

12.
Li Q  Yu K 《Genetic epidemiology》2008,32(3):215-226
Hidden population substructure can cause population stratification and lead to false-positive findings in population-based genome-wide association (GWA) studies. Given a large panel of markers scanned in a GWA study, it becomes increasingly feasible to uncover the hidden population substructure within the study sample based on measured genotypes across the genome. Recognizing that population substructure can be displayed as clustered and/or continuous patterns of genetic variation, we propose a method that aims at the detection and correction of the confounding effect resulting from both patterns of population substructure. The proposed method is an extension of the EIGENSTRAT method (Price et al. [2006] Nat Genet 38:904-909). This approach is computationally feasible and easily applied to large-scale GWA studies. We show through simulation studies that, compared with the EIGENSTRAT method, the new method requires a smaller number of markers and yields a more appropriate correction for population stratification.  相似文献   

13.
With reductions in genotyping costs and the fast pace of improvements in genotyping technology, it is not uncommon for the individuals in a single study to undergo genotyping using several different platforms, where each platform may contain different numbers of markers selected via different criteria. For example, a set of cases and controls may be genotyped at markers in a small set of carefully selected candidate genes, and shortly thereafter, the same cases and controls may be used for a genome-wide single nucleotide polymorphism (SNP) association study. After such initial investigations, often, a subset of "interesting" markers is selected for validation or replication. Specifically, by validation, we refer to the investigation of associations between the selected subset of markers and the disease in independent data. However, it is not obvious how to choose the best set of markers for this validation. There may be a prior expectation that some sets of genotyping data are more likely to contain real associations. For example, it may be more likely for markers in plausible candidate genes to show disease associations than markers in a genome-wide scan. Hence, it would be desirable to select proportionally more markers from the candidate gene set. When a fixed number of markers are selected for validation, we propose an approach for identifying an optimal marker-selection configuration by basing the approach on minimizing the stratified false discovery rate. We illustrate this approach using a case-control study of colorectal cancer from Ontario, Canada, and we show that this approach leads to substantial reductions in the estimated false discovery rates in the Ontario dataset for the selected markers, as well as reductions in the expected false discovery rates for the proposed validation dataset.  相似文献   

14.
Meta-analyses of genome-wide association studies require numerous study partners to conduct pre-defined analyses and thus simple but efficient analyses plans. Potential differences between strata (e.g. men and women) are usually ignored, but often the question arises whether stratified analyses help to unravel the genetics of a phenotype or if they unnecessarily increase the burden of analyses. To decide whether to stratify or not to stratify, we compare general analytical power computations for the overall analysis with those of stratified analyses considering quantitative trait analyses and two strata. We also relate the stratification problem to interaction modeling and exemplify theoretical considerations on obesity and renal function genetics. We demonstrate that the overall analyses have better power compared to stratified analyses as long as the signals are pronounced in both strata with consistent effect direction. Stratified analyses are advantageous in the case of signals with zero (or very small) effect in one stratum and for signals with opposite effect direction in the two strata. Applying the joint test for a main SNP effect and SNP-stratum interaction beats both overall and stratified analyses regarding power, but involves more complex models. In summary, we recommend to employ stratified analyses or the joint test to better understand the potential of strata-specific signals with opposite effect direction. Only after systematic genome-wide searches for opposite effect direction loci have been conducted, we will know if such signals exist and to what extent stratified analyses can depict loci that otherwise are missed.  相似文献   

15.
On transferability of genome-wide tagSNPs   总被引:1,自引:0,他引:1  
The question of tagging single nucleotide polymorphism (tagSNP) transferability is an important one because many ongoing and upcoming Genome-Wide Association studies rely critically upon the validity, and practical feasibility of using a universal core set of tagSNPs. A series of recent studies analyzed performance of tagSNPs selected based on the HapMap. While these studies showed largely satisfactory transferability of the tagSNPs, they also reported that the level of transferability varies, substantively sometimes, especially when tagSNPs selected in one population were used in another distant population. We present a review of the literature about where and why tagSNP transferability may become a problem and suggest research directions that may help the resolution.  相似文献   

16.
Many complex diseases are likely to be a result of the interplay of genes and environmental exposures. The standard analysis in a genome-wide association study (GWAS) scans for main effects and ignores the potentially useful information in the available exposure data. Two recently proposed methods that exploit environmental exposure information involve a two-step analysis aimed at prioritizing the large number of SNPs tested to highlight those most likely to be involved in a GE interaction. For example, Murcray et al. ([2009] Am J Epidemiol 169:219–226) proposed screening on a test that models the G-E association induced by an interaction in the combined case-control sample. Alternatively, Kooperberg and LeBlanc ([2008] Genet Epidemiol 32:255–263) suggested screening on genetic marginal effects. In both methods, SNPs that pass the respective screening step at a pre-specified significance threshold are followed up with a formal test of interaction in the second step. We propose a hybrid method that combines these two screening approaches by allocating a proportion of the overall genomewide significance level to each test. We show that the Murcray et al. approach is often the most efficient method, but that the hybrid approach is a powerful and robust method for nearly any underlying model. As an example, for a GWAS of 1 million markers including a single true disease SNP with minor allele frequency of 0.15, and a binary exposure with prevalence 0.3, the Murcray, Kooperberg and hybrid methods are 1.90, 1.27, and 1.87 times as efficient, respectively, as the traditional case-control analysis to detect an interaction effect size of 2.0.  相似文献   

17.
Emily M 《Statistics in medicine》2012,31(21):2359-2373
Epistasis is often cited as the biological mechanism carrying the missing heritability in genome‐wide association studies. However, there is a very few number of studies reported in the literature. The low power of existing statistical methods is a potential explanation. Statistical procedures are also mainly based on the statistical definition of epistasis that prevents from detecting SNP–SNP interactions that rely on some classes of epistatic models. In this paper, we propose a new statistic, called IndOR for independence‐based odds ratio, based on the biological definition of epistasis. We assume that epistasis modifies the dependency between the two causal SNPs, and we develop a Wald procedure to test such hypothesis. Our new statistic is compared with three statistical procedures in a large power study on simulated data sets. We use extensive simulations, based on 45 scenarios, to investigate the effect of three factors: the underlying disease model, the linkage disequilibrium, and the control‐to‐case ratio. We demonstrate that our new test has the ability to detect a wider range of epistatic models. Furthermore, our new statistical procedure is remarkably powerful when the two loci are linked and when the control‐to‐case ratio is higher than 1. The application of our new statistic on the Wellcome Trust Case Control Consortium data set on Crohn's disease enhances our results on simulated data. Our new test, IndOR, catches previously reported interaction with more power. Furthermore, a new combination of variant has been detected by our new test as significantly associated with Crohn's disease. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
Even in large-scale genome-wide association studies (GWASs), only a fraction of the true associations are detected at the genome-wide significance level. When few or no associations reach the significance threshold, one strategy is to follow up on the most promising candidates, i.e. the single nucleotide polymorphisms (SNPs) with the smallest association-test P-values, by genotyping them in additional studies. In this communication, we propose an overall test for GWASs that analyzes the SNPs with the most promising P-values simultaneously and therefore allows an early assessment of whether the follow-up of the selected SNPs is likely promising. We theoretically derive the properties of the proposed overall test under the null hypothesis and assess its power based on simulation studies. An application to a GWAS for chronic obstructive pulmonary disease suggests that there are true association signals among the top SNPs and that an additional follow-up study is promising.  相似文献   

19.
Genome-wide association studies (GWAS) routinely apply principal component analysis (PCA) to infer population structure within a sample to correct for confounding due to ancestry. GWAS implementation of PCA uses tens of thousands of single-nucleotide polymorphisms (SNPs) to infer structure, despite the fact that only a small fraction of such SNPs provides useful information on ancestry. The identification of this reduced set of ancestry-informative markers (AIMs) from a GWAS has practical value; for example, researchers can genotype the AIM set to correct for potential confounding due to ancestry in follow-up studies that utilize custom SNP or sequencing technology. We propose a novel technique to identify AIMs from genome-wide SNP data using sparse PCA. The procedure uses penalized regression methods to identify those SNPs in a genome-wide panel that significantly contribute to the principal components while encouraging SNPs that provide negligible loadings to vanish from the analysis. We found that sparse PCA leads to negligible loss of ancestry information compared to traditional PCA analysis of genome-wide SNP data. We further demonstrate the value of sparse PCA for AIM selection using real data from the International HapMap Project and a genomewide study of inflammatory bowel disease. We have implemented our approach in open-source R software for public use.  相似文献   

20.
Large-scale genome-wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large-scale data. Currently, the commonly used tests for GWAS include the Cochran-Armitage trend test, the allelic χ(2) test, the genotypic χ(2) test, the haplotypic χ(2) test, and the multi-marker genotypic χ(2) test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号