首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Even in large-scale genome-wide association studies (GWASs), only a fraction of the true associations are detected at the genome-wide significance level. When few or no associations reach the significance threshold, one strategy is to follow up on the most promising candidates, i.e. the single nucleotide polymorphisms (SNPs) with the smallest association-test P-values, by genotyping them in additional studies. In this communication, we propose an overall test for GWASs that analyzes the SNPs with the most promising P-values simultaneously and therefore allows an early assessment of whether the follow-up of the selected SNPs is likely promising. We theoretically derive the properties of the proposed overall test under the null hypothesis and assess its power based on simulation studies. An application to a GWAS for chronic obstructive pulmonary disease suggests that there are true association signals among the top SNPs and that an additional follow-up study is promising.  相似文献   

2.
Optimal designs for two-stage genome-wide association studies   总被引:3,自引:0,他引:3  
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in stage 2. We describe a strategy for designing cost-effective two-stage GWA studies. Our strategy preserves much of the power of the corresponding one-stage design and minimizes the genotyping cost of the study while allowing for differences in per genotyping cost between stages 1 and 2. We show that the ratio of stage 2 to stage 1 per genotype cost can strongly influence both the optimal design and the genotyping cost of the study. Increasing the stage 2 per genotype cost shifts more of the genotyping and study cost to stage 1, and increases the cost of the study. This higher cost can be partially mitigated by adopting a design with reduced power while preserving the false positive rate or by increasing the false positive rate while preserving power. For example, reducing the power preserved in the two-stage design from 99 to 95% that of the one-stage design decreases the two-stage study cost by approximately 15%. Alternatively, the same cost savings can be had by relaxing the false positive rate by 2.5-fold, for example from 1/300,000 to 2.5/300,000, while retaining the same power.  相似文献   

3.
Large-scale genome-wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large-scale data. Currently, the commonly used tests for GWAS include the Cochran-Armitage trend test, the allelic χ(2) test, the genotypic χ(2) test, the haplotypic χ(2) test, and the multi-marker genotypic χ(2) test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases.  相似文献   

4.
With the establishment of large consortiums of researchers, genome-wide association (GWA) studies have become increasingly popular and feasible. Although most of these association studies focus on unrelated individuals, a lot of advantages can be exploited by including families in the analysis as well. To overcome the additional genotyping cost, multi-stage designs are particularly useful. In this article, I offer a perspective view on genome-wide family-based association analyses, both within a model-based and model-free paradigm. I highlight how multi-stage designs and analysis techniques, which are quite popular in clinical epidemiology, can enter GWA settings. I furthermore discuss how they have proven successful in reducing analysis complexity, and in overcoming one of the most cumbersome statistical hurdles in the genome-wide context, namely controlling increased false positives due to multiple testing.  相似文献   

5.
Li C  Li M  Long JR  Cai Q  Zheng W 《Genetic epidemiology》2008,32(5):387-395
Genome-wide association (GWA) studies have recently emerged as a major approach to gene discovery for many complex diseases. Since GWA scans are expensive, cost efficiency is an important factor to consider in study design. However, it often requires extensive and time-consuming computer simulations to compare cost efficiency across different single nucleotide polymorphism (SNP) chips. Here, we propose two simulation-free approaches to cost efficiency comparisons across SNP chips. In the first method, the overall power under a given disease model is calculated for each SNP chip and various sample sizes. Then SNP chips can be compared with respect to the sample sizes required to achieve the same level of power. In the second method, for a desired level of genomic coverage, the effective r(2) threshold values are calculated for each SNP chip. Since r(2) is inversely proportional to the sample size to achieve the same power, the required sample sizes can then be compared among SNP chips. These two methods are complementary to each other. The first approach provides direct power comparisons, but it requires information on disease model and may not be reliable for SNP chips that contain many non-HapMap SNPs. The second approach allows sample size comparisons based on the coverage of SNP chips, and it can be modified for SNP chips that contain non-HapMap SNPs. These methods are particularly relevant for large epidemiological studies in which enough subjects are available for GWA screening and follow-up stages. We illustrate these approaches using five currently available whole genome SNP chips.  相似文献   

6.
Li Q  Yu K 《Genetic epidemiology》2008,32(3):215-226
Hidden population substructure can cause population stratification and lead to false-positive findings in population-based genome-wide association (GWA) studies. Given a large panel of markers scanned in a GWA study, it becomes increasingly feasible to uncover the hidden population substructure within the study sample based on measured genotypes across the genome. Recognizing that population substructure can be displayed as clustered and/or continuous patterns of genetic variation, we propose a method that aims at the detection and correction of the confounding effect resulting from both patterns of population substructure. The proposed method is an extension of the EIGENSTRAT method (Price et al. [2006] Nat Genet 38:904-909). This approach is computationally feasible and easily applied to large-scale GWA studies. We show through simulation studies that, compared with the EIGENSTRAT method, the new method requires a smaller number of markers and yields a more appropriate correction for population stratification.  相似文献   

7.
Though multiple interacting loci are likely involved in the etiology of complex diseases, early genome-wide association studies (GWAS) have depended on the detection of the marginal effects of each locus. Here, we evaluate the power of GWAS in the presence of two linked and potentially associated causal loci for several models of interaction between them and find that interacting loci may give rise to marginal relative risks that are not generally considered in a one-locus model. To derive power under realistic situations, we use empirical data generated by the HapMap ENCODE project for both allele frequencies and linkage disequilibrium (LD) structure. The power is also evaluated in situations where the causal single nucleotide polymorphisms (SNPs) may not be genotyped, but rather detected by proxy using a SNP in LD. A common simplification for such power computations assumes that the sample size necessary to detect the effect at the tSNP is the sample size necessary to detect the causal locus directly divided by the LD measure r(2) between the two. This assumption, which we call the "proportionality assumption", is a simplification of the many factors that contribute to the strength of association at a marker, and has recently been criticized as unreasonable (Terwilliger and Hiekkalinna [2006] Eur J Hum Genet 14(4):426-437), in particular in the presence of interacting and associated loci. We find that this assumption does not introduce much error in single locus models of disease, but may do so in so in certain two-locus models.  相似文献   

8.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

9.
Despite the success of genome-wide association studies, much of the genetic contribution to complex human traits is still unexplained. One potential source of genetic variation that may contribute to this "missing heritability" is that which differs in magnitude and/or direction between males and females, which could result from sexual dimorphism in gene expression. Such sex-differentiated effects are common in model organisms, and are becoming increasingly evident in human complex traits through large-scale male- and female-specific meta-analyses. In this article, we review the methodology for meta-analysis of sex-specific genome-wide association studies, and propose a sex-differentiated test of association with quantitative or dichotomous traits, which allows for heterogeneity of allelic effects between males and females. We perform detailed simulations to compare the power of the proposed sex-differentiated meta-analysis with the more traditional "sex-combined" approach, which is ambivalent to gender. The results of this study highlight only a small loss in power for the sex-differentiated meta-analysis when the allelic effects of the causal variant are the same in males and females. However, over a range of models of heterogeneity in allelic effects between genders, our sex-differentiated meta-analysis strategy offers substantial gains in power, and thus has the potential to discover novel loci contributing effects to complex human traits with existing genome-wide association data.  相似文献   

10.
Many complex diseases are likely to be a result of the interplay of genes and environmental exposures. The standard analysis in a genome-wide association study (GWAS) scans for main effects and ignores the potentially useful information in the available exposure data. Two recently proposed methods that exploit environmental exposure information involve a two-step analysis aimed at prioritizing the large number of SNPs tested to highlight those most likely to be involved in a GE interaction. For example, Murcray et al. ([2009] Am J Epidemiol 169:219–226) proposed screening on a test that models the G-E association induced by an interaction in the combined case-control sample. Alternatively, Kooperberg and LeBlanc ([2008] Genet Epidemiol 32:255–263) suggested screening on genetic marginal effects. In both methods, SNPs that pass the respective screening step at a pre-specified significance threshold are followed up with a formal test of interaction in the second step. We propose a hybrid method that combines these two screening approaches by allocating a proportion of the overall genomewide significance level to each test. We show that the Murcray et al. approach is often the most efficient method, but that the hybrid approach is a powerful and robust method for nearly any underlying model. As an example, for a GWAS of 1 million markers including a single true disease SNP with minor allele frequency of 0.15, and a binary exposure with prevalence 0.3, the Murcray, Kooperberg and hybrid methods are 1.90, 1.27, and 1.87 times as efficient, respectively, as the traditional case-control analysis to detect an interaction effect size of 2.0.  相似文献   

11.
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses.  相似文献   

12.
Meta-analyses of genome-wide association studies require numerous study partners to conduct pre-defined analyses and thus simple but efficient analyses plans. Potential differences between strata (e.g. men and women) are usually ignored, but often the question arises whether stratified analyses help to unravel the genetics of a phenotype or if they unnecessarily increase the burden of analyses. To decide whether to stratify or not to stratify, we compare general analytical power computations for the overall analysis with those of stratified analyses considering quantitative trait analyses and two strata. We also relate the stratification problem to interaction modeling and exemplify theoretical considerations on obesity and renal function genetics. We demonstrate that the overall analyses have better power compared to stratified analyses as long as the signals are pronounced in both strata with consistent effect direction. Stratified analyses are advantageous in the case of signals with zero (or very small) effect in one stratum and for signals with opposite effect direction in the two strata. Applying the joint test for a main SNP effect and SNP-stratum interaction beats both overall and stratified analyses regarding power, but involves more complex models. In summary, we recommend to employ stratified analyses or the joint test to better understand the potential of strata-specific signals with opposite effect direction. Only after systematic genome-wide searches for opposite effect direction loci have been conducted, we will know if such signals exist and to what extent stratified analyses can depict loci that otherwise are missed.  相似文献   

13.
The impact of erroneous genotypes having passed standard quality control (QC) can be severe in genome-wide association studies, genotype imputation, and estimation of heritability and prediction of genetic risk based on single nucleotide polymorphisms (SNP). To detect such genotyping errors, a simple two-locus QC method, based on the difference in test statistic of association between single SNPs and pairs of SNPs, was developed and applied. The proposed approach could detect many problematic SNPs with statistical significance even when standard single SNP QC analyses fail to detect them in real data. Depending on the data set used, the number of erroneous SNPs that were not filtered out by standard single SNP QC but detected by the proposed approach varied from a few hundred to thousands. Using simulated data, it was shown that the proposed method was powerful and performed better than other tested existing methods. The power of the proposed approach to detect erroneous genotypes was ~80% for a 3% error rate per SNP. This novel QC approach is easy to implement and computationally efficient, and can lead to a better quality of genotypes for subsequent genotype-phenotype investigations.  相似文献   

14.
Meta-analysis of genome-wide association studies involves testing single nucleotide polymorphisms (SNPs) using summary statistics that are weighted sums of site-specific score or Wald statistics. This approach avoids having to pool individual-level data. We describe the weights that maximize the power of the summary statistics. For small effect-sizes, any choice of weights yields summary Wald and score statistics with the same power, and the optimal weights are proportional to the square roots of the sites' Fisher information for the SNP's regression coefficient. When SNP effect size is constant across sites, the optimal summary Wald statistic is the well-known inverse-variance-weighted combination of estimated regression coefficients, divided by its standard deviation. We give simple approximations to the optimal weights for various phenotypes, and show that weights proportional to the square roots of study sizes are suboptimal for data from case-control studies with varying case-control ratios, for quantitative trait data when the trait variance differs across sites, for count data when the site-specific mean counts differ, and for survival data with different proportions of failing subjects. Simulations suggest that weights that accommodate intersite variation in imputation error give little power gain compared to those obtained ignoring imputation uncertainties. We note advantages to combining site-specific score statistics, and we show how they can be used to assess effect-size heterogeneity across sites. The utility of the summary score statistic is illustrated by application to a meta-analysis of schizophrenia data in which only site-specific P-values and directions of association are available.  相似文献   

15.
Genetic association studies are a powerful tool to detect genetic variants that predispose to human disease. Once an associated variant is identified, investigators are also interested in estimating the effect of the identified variant on disease risk. Estimates of the genetic effect based on new association findings tend to be upwardly biased due to a phenomenon known as the “winner's curse.” Overestimation of genetic effect size in initial studies may cause follow‐up studies to be underpowered and so to fail. In this paper, we quantify the impact of the winner's curse on the allele frequency difference and odds ratio estimators for one‐ and two‐stage case‐control association studies. We then propose an ascertainment‐corrected maximum likelihood method to reduce the bias of these estimators. We show that overestimation of the genetic effect by the uncorrected estimator decreases as the power of the association study increases and that the ascertainment‐corrected method reduces absolute bias and mean square error unless power to detect association is high. Genet. Epidemiol. 33:453–462, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

16.
The large number of markers considered in a genome‐wide association study (GWAS) has resulted in a simplification of analyses conducted. Most studies are analyzed one marker at a time using simple tests like the trend test. Methods that account for the special features of genetic association studies, yet remain computationally feasible for genome‐wide analysis, are desirable as they may lead to increased power to detect associations. Haplotype sharing attempts to translate between population genetics and genetic epidemiology. Near a recent mutation that increases disease risk, haplotypes of case participants should be more similar to each other than haplotypes of control participants; conversely, the opposite pattern may be found near a recent mutation that lowers disease risk. We give computationally simple association tests based on haplotype sharing that can be easily applied to GWASs while allowing use of fast (but not likelihood‐based) haplotyping algorithms and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype‐sharing analyses that adjust for population stratification. Applying our methods to a GWAS of Parkinson's disease, we find a genome‐wide significant signal in the CAST gene that is not found by single‐SNP methods. Further, a missing‐data artifact that causes a spurious single‐SNP association on chromosome 9 does not impact our test. Genet. Epidemiol. 33:657–667, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

17.
Multivariate phenotypes are frequently encountered in genome‐wide association studies (GWAS). Such phenotypes contain more information than univariate phenotypes, but how to best exploit the information to increase the chance of detecting genetic variant of pleiotropic effect is not always clear. Moreover, when multivariate phenotypes contain a mixture of quantitative and qualitative measures, limited methods are applicable. In this paper, we first evaluated the approach originally proposed by O'Brien and by Wei and Johnson that combines the univariate test statistics and then we proposed two extensions to that approach. The original and proposed approaches are applicable to a multivariate phenotype containing any type of components including continuous, categorical and survival phenotypes, and applicable to samples consisting of families or unrelated samples. Simulation results suggested that all methods had valid type I error rates. Our extensions had a better power than O'Brien's method with heterogeneous means among univariate test statistics, but were less powerful than O'Brien's with homogeneous means among individual test statistics. All approaches have shown considerable increase in power compared to testing each component of a multivariate phenotype individually in some cases. We apply all the methods to GWAS of serum uric acid levels and gout with 550,000 single nucleotide polymorphisms in the Framingham Heart Study. Genet. Epidemiol. 34:444–454, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

18.
Interpretation of dense single nucleotide polymorphism (SNP) follow-up of genome-wide association or linkage scan signals can be facilitated by establishing expectation for the behaviour of primary mapping signals upon fine-mapping, under both null and alternative hypotheses. We examined the inferences that can be made regarding the posterior probability of a real genetic effect and considered different disease-mapping strategies and prior probabilities of association. We investigated the impact of the extent of linkage disequilibrium between the disease SNP and the primary analysis signal and the extent to which the disease gene can be physically localised under these scenarios. We found that large increases in significance (>2 orders of magnitude) appear in the exclusive domain of genuine genetic effects, especially in the follow-up of genome-wide association scans or consensus regions from multiple linkage scans. Fine-mapping significant association signals that reside directly under linkage peaks yield little improvement in an already high posterior probability of a real effect. Following fine-mapping, those signals that increase in significance also demonstrate improved localisation. We found local linkage disequiliptium patterns around the primary analysis signal(s) and tagging efficacy of typed markers to play an important role in determining a suitable interval for fine-mapping. Our findings help inform the interpretation and design of dense SNP-mapping follow-up studies, thus facilitating discrimination between a genuine genetic effect and chance fluctuation (false positive).  相似文献   

19.
A major challenge in genome‐wide association studies (GWASs) is to derive the multiple testing threshold when hypothesis tests are conducted using a large number of single nucleotide polymorphisms. Permutation tests are considered the gold standard in multiple testing adjustment in genetic association studies. However, it is computationally intensive, especially for GWASs, and can be impractical if a large number of random shuffles are used to ensure accuracy. Many researchers have developed approximation algorithms to relieve the computing burden imposed by permutation. One particularly attractive alternative to permutation is to calculate the effective number of independent tests, Meff, which has been shown to be promising in genetic association studies. In this study, we compare recently developed Meff methods and validate them by the permutation test with 10,000 random shuffles using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K Array Set. Our results show that the simpleM method produces the best approximation of the permutation threshold, and it does so in the shortest amount of time. We also show that Meff is indeed valid on a genome‐wide scale in these data sets based on statistical theory and significance tests. The significance thresholds derived can provide practical guidelines for other studies using similar population samples and genotyping platforms. Genet. Epidemiol. 34:100–105, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

20.
This article applies the recently proposed "stability selection" procedure of Meinshausen and Bühlmann to the problem of variable selection in genome-wide association. In particular, it explores whether stability selection can identify new regions of interest originally missed or can call into legitimate question regions originally flagged. Our analysis of the seven data sets of the Wellcome Trust Case-Control Consortium suggests that stability selection effectively controls the family-wise error rate but suffers a loss of power. The extensive correlation structure among SNP markers induced by linkage disequilibrium renders the procedure too conservative, causing it to miss regions known to be highly significant from simple marginal analyses. As a remedy one can aggregate nearby SNPs into groups and select groups rather than individual SNPs. The modified procedure can accurately identify the most important regions of genome-wide association, but in a simulation study it still offers less power than simpler and less computationally intensive methods of marginal association testing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号