首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome‐wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene–environment interactions. We focus on case‐control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene–environment independence in the underlying population. As increasingly large‐scale GWAS are being performed through consortia effort where it is preferable to share only summary‐level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta‐analysis of “one‐step” maximum‐likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type‐I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene–environment interactions under the assumption of gene–environment independence. Methods are made available for public use through CGEN R software package.  相似文献   

2.
For analysis of the main effects of SNPs, meta‐analysis of summary results from individual studies has been shown to provide comparable results as “mega‐analysis” that jointly analyzes the pooled participant data from the available studies. This fact revolutionized the genetic analysis of complex traits through large GWAS consortia. Investigations of gene‐environment (G×E) interactions are on the rise since they can potentially explain a part of the missing heritability and identify individuals at high risk for disease. However, for analysis of gene‐environment interactions, it is not known whether these methods yield comparable results. In this empirical study, we report that the results from both methods were largely consistent for all four tests; the standard 1 degree of freedom (df) test of main effect only, the 1 df test of the main effect (in the presence of interaction effect), the 1 df test of the interaction effect, and the joint 2 df test of main and interaction effects. They provided similar effect size and standard error estimates, leading to comparable P‐values. The genomic inflation factors and the number of SNPs with various thresholds were also comparable between the two approaches. Mega‐analysis is not always feasible especially in very large and diverse consortia since pooling of raw data may be limited by the terms of the informed consent. Our study illustrates that meta‐analysis can be an effective approach also for identifying interactions. To our knowledge, this is the first report investigating meta‐versus mega‐analyses for interactions.  相似文献   

3.
Meta‐analysis of genome‐wide association studies (GWAS) has achieved great success in detecting loci underlying human diseases. Incorporating GWAS results from diverse ethnic populations for meta‐analysis, however, remains challenging because of the possible heterogeneity across studies. Conventional fixed‐effects (FE) or random‐effects (RE) methods may not be most suitable to aggregate multiethnic GWAS results because of violation of the homogeneous effect assumption across studies (FE) or low power to detect signals (RE). Three recently proposed methods, modified RE (RE‐HE) model, binary‐effects (BE) model and a Bayesian approach (Meta‐analysis of Transethnic Association [MANTRA]), show increased power over FE and RE methods while incorporating heterogeneity of effects when meta‐analyzing trans‐ethnic GWAS results. We propose a two‐stage approach to account for heterogeneity in trans‐ethnic meta‐analysis in which we clustered studies with cohort‐specific ancestry information prior to meta‐analysis. We compare this to a no‐prior‐clustering (crude) approach, evaluating type I error and power of these two strategies, in an extensive simulation study to investigate whether the two‐stage approach offers any improvements over the crude approach. We find that the two‐stage approach and the crude approach for all five methods (FE, RE, RE‐HE, BE, MANTRA) provide well‐controlled type I error. However, the two‐stage approach shows increased power for BE and RE‐HE, and similar power for MANTRA and FE compared to their corresponding crude approach, especially when there is heterogeneity across the multiethnic GWAS results. These results suggest that prior clustering in the two‐stage approach can be an effective and efficient intermediate step in meta‐analysis to account for the multiethnic heterogeneity.  相似文献   

4.
Genome‐wide association studies are proven tools for finding disease genes, but it is often necessary to combine many cohorts into a meta‐analysis to detect statistically significant genetic effects. Often the component studies are performed by different investigators on different populations, using different chips with minimal SNPs overlap. In some cases, raw data are not available for imputation so that only the genotyped single nucleotide polymorphisms (SNPs) results can be used in meta‐analysis. Even when SNP sets are comparable, different cohorts may have peak association signals at different SNPs within the same gene due to population differences in linkage disequilibrium or environmental interactions. We hypothesize that the power to detect statistical signals in these situations will improve by using a method that simultaneously meta‐analyzes and smooths the signal over nearby markers. In this study, we propose regionally smoothed meta‐analysis methods and compare their performance on real and simulated data.  相似文献   

5.
Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Meta‐analysis is essential to this exploration because large sample sizes are required to detect rare variants. Several methods are available to conduct meta‐analysis for rare variants under fixed‐effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. We propose random‐effects models which allow the genetic effects to vary among studies and develop the corresponding meta‐analysis methods for gene‐level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We produce the random‐effects versions of all commonly used gene‐level association tests, including burden, variable threshold, and variance‐component tests. We demonstrate through extensive simulation studies that our random‐effects tests are substantially more powerful than the fixed‐effects tests in the presence of moderate and high between‐study heterogeneity and achieve similar power to the latter when the heterogeneity is low. The usefulness of the proposed methods is further illustrated with data from National Heart, Lung, and Blood Institute Exome Sequencing Project (NHLBI ESP). The relevant software is freely available.  相似文献   

6.
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome‐wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway‐based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within‐category selection to identify the most important SNPs within each gene set. The proposed model operates in a well‐established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case‐Control Consortium Crohn Disease (CD) data set. Genet. Epidemiol. 34: 716‐724, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

7.
Genome‐wide association studies (GWAS) of complex traits have generated many association signals for single nucleotide polymorphisms (SNPs). To understand the underlying causal genetic variant(s), focused DNA resequencing of targeted genomic regions is commonly used, yet the current cost of resequencing limits sample sizes for resequencing studies. Information from the large GWAS can be used to guide choice of samples for resequencing, such as the SNP genotypes in the targeted genomic region. Viewing the GWAS tag‐SNPs as imperfect surrogates for the underlying causal variants, yet expecting that the tag‐SNPs are correlated with the causal variants, a reasonable approach is a two‐phase case‐control design, with the GWAS serving as the first‐phase and the resequencing study serving as the second‐phase. Using stratified sampling based on both tag‐SNP genotypes and case‐control status, we explore the gains in power of a two‐phase design relative to randomly sampling cases and controls for resequencing (i.e., ignoring tag‐SNP genotypes). Simulation results show that stratified sampling based on both tag‐SNP genotypes and case‐control status is not likely to have lower power than stratified sampling based only on case‐control status, and can sometimes have substantially greater power. The gain in power depends on the amount of linkage disequilibrium between the tag‐SNP and causal variant alleles, as well as the effect size of the causal variant. Hence, the two‐phase design provides an efficient approach to follow‐up GWAS signals with DNA resequencing.  相似文献   

8.
Introduction: Genetic discoveries are validated through the meta‐analysis of genome‐wide association scans in large international consortia. Because environmental variables may interact with genetic factors, investigation of differing genetic effects for distinct levels of an environmental exposure in these large consortia may yield additional susceptibility loci undetected by main effects analysis. We describe a method of joint meta‐analysis (JMA) of SNP and SNP by Environment (SNP × E) regression coefficients for use in gene‐environment interaction studies. Methods: In testing SNP × E interactions, one approach uses a two degree of freedom test to identify genetic variants that influence the trait of interest. This approach detects both main and interaction effects between the trait and the SNP. We propose a method to jointly meta‐analyze the SNP and SNP × E coefficients using multivariate generalized least squares. This approach provides confidence intervals of the two estimates, a joint significance test for SNP and SNP × E terms, and a test of homogeneity across samples. Results: We present a simulation study comparing this method to four other methods of meta‐analysis and demonstrate that the JMA performs better than the others when both main and interaction effects are present. Additionally, we implemented our methods in a meta‐analysis of the association between SNPs from the type 2 diabetes‐associated gene PPARG and log‐transformed fasting insulin levels and interaction by body mass index in a combined sample of 19,466 individuals from five cohorts. Genet. Epidemiol. 35:11–18, 2011. © 2010 Wiley‐Liss, Inc.  相似文献   

9.
10.
Exhaustive testing of all possible SNP pairs in a genome‐wide association study (GWAS) generally yields low power to detect gene‐gene (G × G) interactions because of small effect sizes and stringent requirements for multiple‐testing correction. We introduce a new two‐step procedure for testing G × G interactions in case‐control GWAS to detect interacting single nucleotide polymorphisms (SNPs) regardless of their marginal effects. In an initial screening step, all SNP pairs are tested for gene‐gene association in the combined sample of cases and controls. In the second step, the pairs that pass the screening are followed up with a traditional test for G × G interaction. We show that the two‐step method is substantially more powerful to detect G × G interactions than the exhaustive testing approach. For example, with 2,000 cases and 2,000 controls, the two‐step method can have more than 90% power to detect an interaction odds ratio of 2.0 compared to less than 50% power for the exhaustive testing approach. Moreover, we show that a hybrid two‐step approach that combines our newly proposed two‐step test and the two‐step test that screens for marginal effects retains the best power properties of both. The two‐step procedures we introduce have the potential to uncover genetic signals that have not been previously identified in an initial single‐SNP GWAS. We demonstrate the computational feasibility of the two‐step G × G procedure by performing a G × G scan in the asthma GWAS of the University of Southern California Children's Health Study.  相似文献   

11.
With varying, but substantial, proportions of heritability remaining unexplained by summaries of single‐SNP genetic variation, there is a demand for methods that extract maximal information from genetic association studies. One source of variation that is difficult to assess is genetic interactions. A major challenge for naive detection methods is the large number of possible combinations, with a requisite need to correct for multiple testing. Assumptions of large marginal effects, to reduce the search space, may be restrictive and miss higher order interactions with modest marginal effects. In this paper, we propose a new procedure for detecting gene‐by‐gene interactions through heterogeneity in estimated low‐order (e.g., marginal) effect sizes by leveraging population structure, or ancestral differences, among studies in which the same phenotypes were measured. We implement this approach in a meta‐analytic framework, which offers numerous advantages, such as robustness and computational efficiency, and is necessary when data‐sharing limitations restrict joint analysis. We effectively apply a dimension reduction procedure that scales to allow searches for higher order interactions. For comparison to our method, which we term phylogenY‐aware Effect‐size Tests for Interactions (YETI), we adapt an existing method that assumes interacting loci will exhibit strong marginal effects to our meta‐analytic framework. As expected, YETI excels when multiple studies are from highly differentiated populations and maintains its superiority in these conditions even when marginal effects are small. When these conditions are less extreme, the advantage of our method wanes. We assess the Type‐I error and power characteristics of complementary approaches to evaluate their strengths and limitations.  相似文献   

12.
In genome‐wide association studies (GWAS), it is a common practice to impute the genotypes of untyped single nucleotide polymorphism (SNP) by exploiting the linkage disequilibrium structure among SNPs. The use of imputed genotypes improves genome coverage and makes it possible to perform meta‐analysis combining results from studies genotyped on different platforms. A popular way of using imputed data is the “expectation‐substitution” method, which treats the imputed dosage as if it were the true genotype. In current practice, the estimates given by the expectation‐substitution method are usually combined using inverse variance weighting (IVM) scheme in meta‐analysis. However, the IVM is not optimal as the estimates given by the expectation‐substitution method are generally biased. The optimal weight is, in fact, proportional to the inverse variance and the expected value of the effect size estimates. We show both theoretically and numerically that the bias of the estimates is very small under practical conditions of low effect sizes in GWAS. This finding validates the use of the expectation‐substitution method, and shows the inverse variance is a good approximation of the optimal weight. Through simulation, we compared the power of the IVM method with several methods including the optimal weight, the regular z‐score meta‐analysis and a recently proposed “imputation aware” meta‐analysis method (Zaitlen and Eskin [2010] Genet Epidemiol 34:537–542). Our results show that the performance of the inverse variance weight is always indistinguishable from the optimal weight and similar to or better than the other two methods. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc. 35:597‐605, 2011  相似文献   

13.
Genome‐wide association studies (GWAS) have been widely used to identify genetic effects on complex diseases or traits. Most currently used methods are based on separate single‐nucleotide polymorphism (SNP) analyses. Because this approach requires correction for multiple testing to avoid excessive false‐positive results, it suffers from reduced power to detect weak genetic effects under limited sample size. To increase the power to detect multiple weak genetic factors and reduce false‐positive results caused by multiple tests and dependence among test statistics, a modified forward multiple regression (MFMR) approach is proposed. Simulation studies show that MFMR has higher power than the Bonferroni and false discovery rate procedures for detecting moderate and weak genetic effects, and MFMR retains an acceptable‐false positive rate even if causal SNPs are correlated with many SNPs due to population stratification or other unknown reasons. Genet. Epidemiol. 33:518–525, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
There is a growing recognition that gene–environment interaction (G × E) plays a pivotal role in the development and progression of complex diseases. Despite a wealth of genetic data on various complex diseases/traits generated from association and sequencing studies, detecting G × E via genome‐wide analysis remains challenging due to power issues. In genome‐wide G × E studies, a common strategy to improve power is to first conduct a filtering test and retain only the genetic variants that pass the filtering step for subsequent G × E analyses. Two‐stage, multistage, and unified tests have been proposed to jointly consider the filtering statistics in G × E tests. However, such G × E tests based on data from a single study may still be underpowered. Meanwhile, large‐scale consortia have been formed to borrow strength across studies and populations. In this work, motivated by existing single‐study G × E tests with filtering and the needs for meta‐analysis G × E approaches based on consortia data, we propose a meta‐analysis framework for detecting gene‐based G × E effects, and introduce meta‐analysis‐based filtering statistics in the gene‐level G × E tests. Simulations demonstrate the advantages of the proposed method—the ofGEM test. We apply the proposed tests to existing data from two breast cancer consortia to identify the genes harboring genetic variants with age‐dependent penetrance (i.e., gene–age interaction effects). We develop an R software package ofGEM for the proposed meta‐analysis tests.  相似文献   

15.
Genome‐wide association studies (GWAS) have been successful in identifying common variants related to complex disorders. However, some disorders have proved resistant to this strategy with few associations confirmed, despite evidence from twin and family studies of a genetic component. Sophisticated strategies that account for phenotypic heterogeneity may be required to uncover these genetic contributions. Age at onset is an example of a potential source of this heterogeneity in ischaemic stroke. We explore the contribution of age at onset in the Wellcome Trust Case‐Control Consortium 2 ischaemic stroke study. We first examine four established stroke loci in younger onset cases. We extend this to all single‐nucleotide polymorphisms (SNPs) genome‐wide, testing for stronger association signals in younger subsets of cases. Finally, we estimate the pseudoheritability accounted for by common SNPs present on genome‐wide genotyping arrays for cases stratified by age at onset. We find evidence for stronger associations in younger onset cases for the four established stroke loci. Genome‐wide, in cardioembolic and small vessel stroke subphenotypes, a significant number of SNPs show stronger association P‐values when the oldest cases are removed. Finally, we show that the pseudoheritability estimated by common SNPs in cardioembolic stroke increased from 16.5% for older onset cases to 28.5% for younger onset cases. Our results indicate that age at onset is a valuable measure for case ascertainment and in analysis of GWAS in ischaemic stroke: focussing on younger cases who may have a stronger genetic predisposition increases power to detect associations.  相似文献   

16.
Meta‐analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta‐analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case–control ratios. Here, we investigate the power loss problem by the standard meta‐analysis methods for unbalanced studies, and further propose novel meta‐analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta‐score‐statistics that can accurately approximate the joint‐score‐statistics with combined individual‐level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene‐level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene‐level tests with 26 unbalanced studies of age‐related macular degeneration . In addition, we took the meta‐analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta‐analyzing multi‐ethnic samples. In summary, our improved meta‐score‐statistics with corrections for population stratification can be used to construct both single‐variant and gene‐level association studies, providing a useful framework for ensuring well‐powered, convenient, cross‐study analyses.  相似文献   

17.
The primary circulating form of vitamin D is 25‐hydroxy vitamin D (25(OH)D), a modifiable trait linked with a growing number of chronic diseases. In addition to environmental determinants of 25(OH)D, including dietary sources and skin ultraviolet B (UVB) exposure, twin‐ and family‐based studies suggest that genetics contribute substantially to vitamin D variability with heritability estimates ranging from 43% to 80%. Genome‐wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) located in four gene regions associated with 25(OH)D. These SNPs collectively explain only a fraction of the heritability in 25(OH)D estimated by twin‐ and family‐based studies. Using 25(OH)D concentrations and GWAS data on 5,575 subjects drawn from five cohorts, we hypothesized that genome‐wide data, in the form of (1) a polygenic score comprised of hundreds or thousands of SNPs that do not individually reach GWAS significance, or (2) a linear mixed model for genome‐wide complex trait analysis, would explain variance in measured circulating 25(OH)D beyond that explained by known genome‐wide significant 25(OH)D‐associated SNPs. GWAS identified SNPs explained 5.2% of the variation in circulating 25(OH)D in these samples and there was little evidence additional markers significantly improved predictive ability. On average, a polygenic score comprised of GWAS‐identified SNPs explained a larger proportion of variation in circulating 25(OH)D than scores comprised of thousands of SNPs that were on average, nonsignificant. Employing a linear mixed model for genome‐wide complex trait analysis explained little additional variability (range 0–22%). The absence of a significant polygenic effect in this relatively large sample suggests an oligogenetic architecture for 25(OH)D.  相似文献   

18.
As the cost of genome‐wide genotyping decreases, the number of genome‐wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system‐level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high‐throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single‐marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single‐nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene‐gene interactions and pathway‐pathway relationships, we propose a system‐level pathway analysis approach, synthetic feature random forest (SF‐RF), which is designed to detect pathway‐phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF‐RF with pathway‐based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway‐phenotype association. We apply SF‐RF to a population‐based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway‐phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.  相似文献   

19.
Genomewide association studies (GWAS) and candidate‐gene studies have implicated single‐nucleotide polymorphisms (SNPs) in at least 45 different genes as putative glioma risk factors. Attempts to validate these associations have yielded variable results and few genetic risk factors have been consistently replicated. We conducted a case‐control study of Caucasian glioma cases and controls from the University of California San Francisco (810 cases, 512 controls) and the Mayo Clinic (852 cases, 789 controls) in an attempt to replicate previously reported genetic risk factors for glioma. Sixty SNPs selected from the literature (eight from GWAS and 52 from candidate‐gene studies) were successfully genotyped on an Illumina custom genotyping panel. Eight SNPs in/near seven different genes (TERT, EGFR, CCDC26, CDKN2A, PHLDB1, RTEL1, TP53) were significantly associated with glioma risk in the combined dataset (P < 0.05), with all associations in the same direction as in previous reports. Several SNP associations showed considerable differences across histologic subtype. All eight successfully replicated associations were first identified by GWAS, although none of the putative risk SNPs from candidate‐gene studies was associated in the full case‐control sample (all P values > 0.05). Although several confirmed associations are located near genes long known to be involved in gliomagenesis (e.g., EGFR, CDKN2A, TP53), these associations were first discovered by the GWAS approach and are in noncoding regions. These results highlight that the deficiencies of the candidate‐gene approach lay in selecting both appropriate genes and relevant SNPs within these genes.  相似文献   

20.
With a typical sample size of a few thousand subjects, a single genome‐wide association study (GWAS) using traditional one single nucleotide polymorphism (SNP)‐at‐a‐time methods can only detect genetic variants conferring a sizable effect on disease risk. Set‐based methods, which analyze sets of SNPs jointly, can detect variants with smaller effects acting within a gene, a pathway, or other biologically relevant sets. Although self‐contained set‐based methods (those that test sets of variants without regard to variants not in the set) are generally more powerful than competitive set‐based approaches (those that rely on comparison of variants in the set of interest with variants not in the set), there is no consensus as to which self‐contained methods are best. In particular, several self‐contained set tests have been proposed to directly or indirectly “adapt” to the a priori unknown proportion and distribution of effects of the truly associated SNPs in the set, which is a major determinant of their power. A popular adaptive set‐based test is the adaptive rank truncated product (ARTP), which seeks the set of SNPs that yields the best‐combined evidence of association. We compared the standard ARTP, several ARTP variations we introduced, and other adaptive methods in a comprehensive simulation study to evaluate their performance. We used permutations to assess significance for all the methods and thus provide a level playing field for comparison. We found the standard ARTP test to have the highest power across our simulations followed closely by the global model of random effects (GMRE) and a least absolute shrinkage and selection operator (LASSO)‐based test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号