首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
On transferability of genome-wide tagSNPs   总被引:1,自引:0,他引:1  
The question of tagging single nucleotide polymorphism (tagSNP) transferability is an important one because many ongoing and upcoming Genome-Wide Association studies rely critically upon the validity, and practical feasibility of using a universal core set of tagSNPs. A series of recent studies analyzed performance of tagSNPs selected based on the HapMap. While these studies showed largely satisfactory transferability of the tagSNPs, they also reported that the level of transferability varies, substantively sometimes, especially when tagSNPs selected in one population were used in another distant population. We present a review of the literature about where and why tagSNP transferability may become a problem and suggest research directions that may help the resolution.  相似文献   

2.
The interpretation of the results of large association studies encompassing much or all of the human genome faces the fundamental statistical problem that a correspondingly large number of single nucleotide polymorphisms markers will be spuriously flagged as significant. A common method of dealing with these false positives is to raise the significance level for the individual tests for association of each marker. Any such adjustment for multiple testing is ultimately based on a more or less precise estimate for the actual overall type I error probability. We estimate this probability for association tests for correlated markers and show that it depends in a nonlinear way on the significance level for the individual tests. This dependence of the effective number of tests is not taken into account by existing multiple-testing corrections, leading to widely overestimated results. We demonstrate a simple correction for multiple testing, which can easily be calculated from the pairwise correlation and gives far more realistic estimates for the effective number of tests than previous formulae. The calculation is considerably faster than with other methods and hence applicable on a genome-wide scale. The efficacy of our method is shown on a constructed example with highly correlated markers as well as on real data sets, including a full genome scan where a conservative estimate only 8% above the permutation estimate is obtained in about 1% of computation time. As the calculation is based on pairwise correlations between markers, it can be performed at the stage of study design using public databases.  相似文献   

3.
With the establishment of large consortiums of researchers, genome-wide association (GWA) studies have become increasingly popular and feasible. Although most of these association studies focus on unrelated individuals, a lot of advantages can be exploited by including families in the analysis as well. To overcome the additional genotyping cost, multi-stage designs are particularly useful. In this article, I offer a perspective view on genome-wide family-based association analyses, both within a model-based and model-free paradigm. I highlight how multi-stage designs and analysis techniques, which are quite popular in clinical epidemiology, can enter GWA settings. I furthermore discuss how they have proven successful in reducing analysis complexity, and in overcoming one of the most cumbersome statistical hurdles in the genome-wide context, namely controlling increased false positives due to multiple testing.  相似文献   

4.
Optimal designs for two-stage genome-wide association studies   总被引:3,自引:0,他引:3  
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in stage 2. We describe a strategy for designing cost-effective two-stage GWA studies. Our strategy preserves much of the power of the corresponding one-stage design and minimizes the genotyping cost of the study while allowing for differences in per genotyping cost between stages 1 and 2. We show that the ratio of stage 2 to stage 1 per genotype cost can strongly influence both the optimal design and the genotyping cost of the study. Increasing the stage 2 per genotype cost shifts more of the genotyping and study cost to stage 1, and increases the cost of the study. This higher cost can be partially mitigated by adopting a design with reduced power while preserving the false positive rate or by increasing the false positive rate while preserving power. For example, reducing the power preserved in the two-stage design from 99 to 95% that of the one-stage design decreases the two-stage study cost by approximately 15%. Alternatively, the same cost savings can be had by relaxing the false positive rate by 2.5-fold, for example from 1/300,000 to 2.5/300,000, while retaining the same power.  相似文献   

5.
Qin H  Zhu X 《Genetic epidemiology》2012,36(3):235-243
When dense markers are available, one can interrogate almost every common variant across the genome via imputation and single nucleotide polymorphism (SNP) test, which has become a routine in current genome-wide association studies (GWASs). As a complement, admixture mapping exploits the long-range linkage disequilibrium (LD) generated by admixture between genetically distinct ancestral populations. It is then questionable whether admixture mapping analysis is still necessary in detecting the disease associated variants in admixed populations. We argue that admixture mapping is able to reduce the burden of massive comparisons in GWASs; it therefore can be a powerful tool to locate the disease variants with substantial allele frequency differences between ancestral populations. In this report we studied a two-stage approach, where candidate regions are defined by conducting admixture mapping at stage 1, and single SNP association tests are followed at stage 2 within the candidate regions defined at stage 1. We first established the genome-wide significance levels corresponding to the criteria to define the candidate regions at stage 1 by simulations. We next compared the power of the two-stage approach with direct association analysis. Our simulations suggest that the two-stage approach can be more powerful than the standard genome-wide association analysis when the allele frequency difference of a causal variant in ancestral populations, is larger than 0.4. Our conclusion is consistent with a theoretical prediction by Risch and Tang ([2006] Am J Hum Genet 79:S254). Surprisingly, our study also suggests that power can be improved when we use less strict criteria to define the candidate regions at stage 1.  相似文献   

6.
Li C  Li M  Long JR  Cai Q  Zheng W 《Genetic epidemiology》2008,32(5):387-395
Genome-wide association (GWA) studies have recently emerged as a major approach to gene discovery for many complex diseases. Since GWA scans are expensive, cost efficiency is an important factor to consider in study design. However, it often requires extensive and time-consuming computer simulations to compare cost efficiency across different single nucleotide polymorphism (SNP) chips. Here, we propose two simulation-free approaches to cost efficiency comparisons across SNP chips. In the first method, the overall power under a given disease model is calculated for each SNP chip and various sample sizes. Then SNP chips can be compared with respect to the sample sizes required to achieve the same level of power. In the second method, for a desired level of genomic coverage, the effective r(2) threshold values are calculated for each SNP chip. Since r(2) is inversely proportional to the sample size to achieve the same power, the required sample sizes can then be compared among SNP chips. These two methods are complementary to each other. The first approach provides direct power comparisons, but it requires information on disease model and may not be reliable for SNP chips that contain many non-HapMap SNPs. The second approach allows sample size comparisons based on the coverage of SNP chips, and it can be modified for SNP chips that contain non-HapMap SNPs. These methods are particularly relevant for large epidemiological studies in which enough subjects are available for GWA screening and follow-up stages. We illustrate these approaches using five currently available whole genome SNP chips.  相似文献   

7.
Genome-wide association studies are carried out to identify unknown genes for a complex trait. Polymorphisms showing the most statistically significant associations are reported and followed up in subsequent confirmatory studies. In addition to the test of association, the statistical analysis provides point estimates of the relationship between the genotype and phenotype at each polymorphism, typically an odds ratio in case-control association studies. The statistical significance of the test and the estimator of the odds ratio are completely correlated. Selecting the most extreme statistics is equivalent to selecting the most extreme odds ratios. The value of the estimator, given the value of the statistical significance depends on the standard error of the estimator and the power of the study. This report shows that when power is low, estimates of the odds ratio from a genome-wide association study, or any large-scale association study, will be upwardly biased. Genome-wide association studies are often underpowered given the low alpha levels required to declare statistical significance and the small individual genetic effects known to characterize complex traits. Factors such as low allele frequency, inadequate sample size and weak genetic effects contribute to large standard errors in the odds ratio estimates, low power and upwardly biased odds ratios. Studies that have high power to detect an association with the true odds ratio will have little or no bias, regardless of the statistical significance threshold. The results have implications for the interpretation of genome-wide association analysis and the planning of subsequent confirmatory stages.  相似文献   

8.
Neighboring common polymorphisms are often correlated (in linkage disequilibrium (LD)) as a result of shared ancestry. An association between a polymorphism and a disease trait may therefore be the indirect result of a correlated functional variant, and identifying the true causal variant(s) from an initial disease association is a major challenge in genetic association studies. Here, we present a method to estimate the sample size needed to discriminate between a functional variant of a given allele frequency and effect size, and other correlated variants. The sample size required to conduct such fine‐scale mapping is typically 1–4 times larger than required to detect the initial association. Association studies in populations with different LD patterns can substantially improve the power to isolate the causal variant. An online tool to perform these calculations is available at http://moya.srl.cam.ac.uk/ocac/FineMappingPowerCalculator.html . Genet. Epidemiol. 34:463–468, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

9.
Despite the success of genome-wide association studies, much of the genetic contribution to complex human traits is still unexplained. One potential source of genetic variation that may contribute to this "missing heritability" is that which differs in magnitude and/or direction between males and females, which could result from sexual dimorphism in gene expression. Such sex-differentiated effects are common in model organisms, and are becoming increasingly evident in human complex traits through large-scale male- and female-specific meta-analyses. In this article, we review the methodology for meta-analysis of sex-specific genome-wide association studies, and propose a sex-differentiated test of association with quantitative or dichotomous traits, which allows for heterogeneity of allelic effects between males and females. We perform detailed simulations to compare the power of the proposed sex-differentiated meta-analysis with the more traditional "sex-combined" approach, which is ambivalent to gender. The results of this study highlight only a small loss in power for the sex-differentiated meta-analysis when the allelic effects of the causal variant are the same in males and females. However, over a range of models of heterogeneity in allelic effects between genders, our sex-differentiated meta-analysis strategy offers substantial gains in power, and thus has the potential to discover novel loci contributing effects to complex human traits with existing genome-wide association data.  相似文献   

10.
Promising findings from genetic association studies are commonly presented with two distinct figures: one gives the association study results and the other indicates linkage disequilibrium (LD) between genetic markers in the region(s) of interest. Fully interpreting the results of such studies requires synthesizing the information in these figures, which is generally done in a subjective and unsystematic manner. Here we present a method to formally combine association results and LD and display them in the same figure; we have developed a freely available web‐based application that can be used to generate figures to display the combined data. To demonstrate this approach we apply it to fine mapping data from the prostate cancer 8q24 loci. Combining these two sources of information in a single figure allows one to more clearly assess patterns of association, facilitating the interpretation of genome‐wide and fine mapping data and improving our ability to localize causal variants. Genet. Epidemiol. 33:599–603, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
Interpretation of dense single nucleotide polymorphism (SNP) follow-up of genome-wide association or linkage scan signals can be facilitated by establishing expectation for the behaviour of primary mapping signals upon fine-mapping, under both null and alternative hypotheses. We examined the inferences that can be made regarding the posterior probability of a real genetic effect and considered different disease-mapping strategies and prior probabilities of association. We investigated the impact of the extent of linkage disequilibrium between the disease SNP and the primary analysis signal and the extent to which the disease gene can be physically localised under these scenarios. We found that large increases in significance (>2 orders of magnitude) appear in the exclusive domain of genuine genetic effects, especially in the follow-up of genome-wide association scans or consensus regions from multiple linkage scans. Fine-mapping significant association signals that reside directly under linkage peaks yield little improvement in an already high posterior probability of a real effect. Following fine-mapping, those signals that increase in significance also demonstrate improved localisation. We found local linkage disequiliptium patterns around the primary analysis signal(s) and tagging efficacy of typed markers to play an important role in determining a suitable interval for fine-mapping. Our findings help inform the interpretation and design of dense SNP-mapping follow-up studies, thus facilitating discrimination between a genuine genetic effect and chance fluctuation (false positive).  相似文献   

12.
Improving power in genome-wide association studies: weights tip the scale   总被引:3,自引:0,他引:3  
The potential of genome-wide association analysis can only be realized when they have power to detect signals despite the detrimental effect of multiple testing on power. We develop a weighted multiple testing procedure that facilitates the input of prior information in the form of groupings of tests. For each group a weight is estimated from the observed test statistics within the group. Differentially weighting groups improves the power to detect signals in likely groupings. The advantage of the grouped-weighting concept, over fixed weights based on prior information, is that it often leads to an increase in power even if many of the groupings are not correlated with the signal. Being data dependent, the procedure is remarkably robust to poor choices in groupings. Power is typically improved if one (or more) of the groups clusters multiple tests with signals, yet little power is lost when the groupings are totally random. If there is no apparent signal in a group, relative to a group that appears to have several tests with signals, the former group will be down-weighted relative to the latter. If no groups show apparent signals, then the weights will be approximately equal. The only restriction on the procedure is that the number of groups be small, relative to the total number of tests performed.  相似文献   

13.
Genome-wide association (GWA) studies have been extremely successful in identifying novel loci contributing effects to a wide range of complex human traits. However, despite this success, the joint marginal effects of these loci account for only a small proportion of the heritability of these traits. Interactions between variants in different loci are not typically modelled in traditional GWA analysis, but may account for some of the missing heritability in humans, as they do in other model organisms. One of the key challenges in performing gene-gene interaction studies is the computational burden of the analysis. We propose a two-stage interaction analysis strategy to address this challenge in the context of both quantitative traits and dichotomous phenotypes. We have performed simulations to demonstrate only a negligible loss in power of this two-stage strategy, while minimizing the computational burden. Application of this interaction strategy to GWA studies of T2D and obesity highlights potential novel signals of association, which warrant follow-up in larger cohorts.  相似文献   

14.
Most findings from genome‐wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well‐known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely. Genet. Epidemiol. 35: 278‐290, 2011. © 2011 Wiley‐Liss, Inc.  相似文献   

15.
Many complex diseases are likely to be a result of the interplay of genes and environmental exposures. The standard analysis in a genome-wide association study (GWAS) scans for main effects and ignores the potentially useful information in the available exposure data. Two recently proposed methods that exploit environmental exposure information involve a two-step analysis aimed at prioritizing the large number of SNPs tested to highlight those most likely to be involved in a GE interaction. For example, Murcray et al. ([2009] Am J Epidemiol 169:219–226) proposed screening on a test that models the G-E association induced by an interaction in the combined case-control sample. Alternatively, Kooperberg and LeBlanc ([2008] Genet Epidemiol 32:255–263) suggested screening on genetic marginal effects. In both methods, SNPs that pass the respective screening step at a pre-specified significance threshold are followed up with a formal test of interaction in the second step. We propose a hybrid method that combines these two screening approaches by allocating a proportion of the overall genomewide significance level to each test. We show that the Murcray et al. approach is often the most efficient method, but that the hybrid approach is a powerful and robust method for nearly any underlying model. As an example, for a GWAS of 1 million markers including a single true disease SNP with minor allele frequency of 0.15, and a binary exposure with prevalence 0.3, the Murcray, Kooperberg and hybrid methods are 1.90, 1.27, and 1.87 times as efficient, respectively, as the traditional case-control analysis to detect an interaction effect size of 2.0.  相似文献   

16.
Population isolates may be particularly useful for association studies of complex traits. This utility, however, largely depends on the transferability of tag SNPs chosen from reference samples, such as HapMap, to samples from such populations. Factors that characterize population isolates, such as widespread genetic drift, could impede such transferability. In this report, we show that tag SNPs chosen from HapMap perform well in several population isolates; this is true even for populations that differ substantially from the HapMap sample either in levels of linkage disequilibrium or in SNP allele frequency distributions.  相似文献   

17.
Li Q  Yu K 《Genetic epidemiology》2008,32(3):215-226
Hidden population substructure can cause population stratification and lead to false-positive findings in population-based genome-wide association (GWA) studies. Given a large panel of markers scanned in a GWA study, it becomes increasingly feasible to uncover the hidden population substructure within the study sample based on measured genotypes across the genome. Recognizing that population substructure can be displayed as clustered and/or continuous patterns of genetic variation, we propose a method that aims at the detection and correction of the confounding effect resulting from both patterns of population substructure. The proposed method is an extension of the EIGENSTRAT method (Price et al. [2006] Nat Genet 38:904-909). This approach is computationally feasible and easily applied to large-scale GWA studies. We show through simulation studies that, compared with the EIGENSTRAT method, the new method requires a smaller number of markers and yields a more appropriate correction for population stratification.  相似文献   

18.
Even in large-scale genome-wide association studies (GWASs), only a fraction of the true associations are detected at the genome-wide significance level. When few or no associations reach the significance threshold, one strategy is to follow up on the most promising candidates, i.e. the single nucleotide polymorphisms (SNPs) with the smallest association-test P-values, by genotyping them in additional studies. In this communication, we propose an overall test for GWASs that analyzes the SNPs with the most promising P-values simultaneously and therefore allows an early assessment of whether the follow-up of the selected SNPs is likely promising. We theoretically derive the properties of the proposed overall test under the null hypothesis and assess its power based on simulation studies. An application to a GWAS for chronic obstructive pulmonary disease suggests that there are true association signals among the top SNPs and that an additional follow-up study is promising.  相似文献   

19.
Emily M 《Statistics in medicine》2012,31(21):2359-2373
Epistasis is often cited as the biological mechanism carrying the missing heritability in genome‐wide association studies. However, there is a very few number of studies reported in the literature. The low power of existing statistical methods is a potential explanation. Statistical procedures are also mainly based on the statistical definition of epistasis that prevents from detecting SNP–SNP interactions that rely on some classes of epistatic models. In this paper, we propose a new statistic, called IndOR for independence‐based odds ratio, based on the biological definition of epistasis. We assume that epistasis modifies the dependency between the two causal SNPs, and we develop a Wald procedure to test such hypothesis. Our new statistic is compared with three statistical procedures in a large power study on simulated data sets. We use extensive simulations, based on 45 scenarios, to investigate the effect of three factors: the underlying disease model, the linkage disequilibrium, and the control‐to‐case ratio. We demonstrate that our new test has the ability to detect a wider range of epistatic models. Furthermore, our new statistical procedure is remarkably powerful when the two loci are linked and when the control‐to‐case ratio is higher than 1. The application of our new statistic on the Wellcome Trust Case Control Consortium data set on Crohn's disease enhances our results on simulated data. Our new test, IndOR, catches previously reported interaction with more power. Furthermore, a new combination of variant has been detected by our new test as significantly associated with Crohn's disease. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

20.
Meta-analyses of genome-wide association studies require numerous study partners to conduct pre-defined analyses and thus simple but efficient analyses plans. Potential differences between strata (e.g. men and women) are usually ignored, but often the question arises whether stratified analyses help to unravel the genetics of a phenotype or if they unnecessarily increase the burden of analyses. To decide whether to stratify or not to stratify, we compare general analytical power computations for the overall analysis with those of stratified analyses considering quantitative trait analyses and two strata. We also relate the stratification problem to interaction modeling and exemplify theoretical considerations on obesity and renal function genetics. We demonstrate that the overall analyses have better power compared to stratified analyses as long as the signals are pronounced in both strata with consistent effect direction. Stratified analyses are advantageous in the case of signals with zero (or very small) effect in one stratum and for signals with opposite effect direction in the two strata. Applying the joint test for a main SNP effect and SNP-stratum interaction beats both overall and stratified analyses regarding power, but involves more complex models. In summary, we recommend to employ stratified analyses or the joint test to better understand the potential of strata-specific signals with opposite effect direction. Only after systematic genome-wide searches for opposite effect direction loci have been conducted, we will know if such signals exist and to what extent stratified analyses can depict loci that otherwise are missed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号