共查询到20条相似文献,搜索用时 15 毫秒
1.
Christian X. Weichenberger Martin Gögele Mirko Modenese John Attia Jennifer H. Barrett Michael Boehnke Giuseppe Borsani Giorgio Casari Caroline S. Fox Thomas Freina Andrew A. Hicks Fabio Marroni Giovanni Parmigiani Andrea Pastore Cristian Pattaro Arne Pfeufer Fabrizio Ruggeri Christine Schwienbacher Daniel Taliun Peter P. Pramstaller Francisco S. Domingues John R. Thompson 《Genetic epidemiology》2013,37(2):205-213
Biological plausibility and other prior information could help select genome‐wide association (GWA) findings for further follow‐up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts’ opinions and empirical evidence to estimate the relative importance of 15 types of information at the single‐nucleotide polymorphism (SNP) and gene levels. Opinions were elicited from 10 experts using a two‐round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNPs established as being associated with seven disease traits through GWA meta‐analysis and independent replication, with the corresponding frequency in a randomly selected set of SNPs. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta‐analysis or more than one study as conferring the highest relative probability of true association, whereas previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, although location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research. 相似文献
2.
Clara C. Elbers Kristel R. van Eijk Lude Franke Flip Mulder Yvonne T. van der Schouw Cisca Wijmenga N. Charlotte Onland‐Moret 《Genetic epidemiology》2009,33(5):419-431
Several genome‐wide association studies (GWAS) have been published on various complex diseases. Although, new loci are found to be associated with these diseases, still only very little of the genetic risk for these diseases can be explained. As GWAS are still underpowered to find small main effects, and gene‐gene interactions are likely to play a role, the data might currently not be analyzed to its full potential. In this study, we evaluated alternative methods to study GWAS data. Instead of focusing on the single nucleotide polymorphisms (SNPs) with the highest statistical significance, we took advantage of prior biological information and tried to detect overrepresented pathways in the GWAS data. We evaluated whether pathway classification analysis can help prioritize the biological pathways most likely to be involved in the disease etiology. In this study, we present the various benefits and limitations of pathway‐classification tools in analyzing GWAS data. We show multiple differences in outcome between pathway tools analyzing the same dataset. Furthermore, analyzing randomly selected SNPs always results in significantly overrepresented pathways, large pathways have a higher chance of becoming statistically significant and the bioinformatics tools used in this study are biased toward detecting well‐defined pathways. As an example, we analyzed data from two GWAS on type 2 diabetes (T2D): the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC). Occasionally the results from the DGI and the WTCCC GWAS showed concordance in overrepresented pathways, but discordance in the corresponding genes. Thus, incorporating gene networks and pathway classification tools into the analysis can point toward significantly overrepresented molecular pathways, which cannot be picked up using traditional single‐locus analyses. However, the limitations discussed in this study, need to be addressed before these methods can be widely used. Genet. Epidemiol. 33:419–431, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
3.
Whole genome sequencing (WGS) and whole exome sequencing studies are used to test the association of rare genetic variants with health traits. Many existing WGS efforts now aggregate data from heterogeneous groups, for example, combining sets of individuals of European and African ancestries. We here investigate the statistical implications on rare variant association testing with a binary trait when combining together heterogeneous studies, defined as studies with potentially different disease proportion and different frequency of variant carriers. We study and compare in simulations the Type 1 error control and power of the naïve score test, the saddlepoint approximation to the score test, and the BinomiRare test in a range of settings, focusing on low numbers of variant carriers. We show that Type 1 error control and power patterns depend on both the number of carriers of the rare allele and on disease prevalence in each of the studies. We develop recommendations for association analysis of rare genetic variants. (1) The Score test is preferred when the case proportion in the sample is 50%. (2) Do not down‐sample controls to balance case–control ratio, because it reduces power. Rather, use a test that controls the Type 1 error. (3) Conduct stratified analysis in parallel with combined analysis. Aggregated testing may have lower power when the variant effect size differs between strata. 相似文献
4.
Xiaoyi Gao Lewis C. Becker Diane M. Becker Joshua D. Starmer Michael A. Province 《Genetic epidemiology》2010,34(1):100-105
A major challenge in genome‐wide association studies (GWASs) is to derive the multiple testing threshold when hypothesis tests are conducted using a large number of single nucleotide polymorphisms. Permutation tests are considered the gold standard in multiple testing adjustment in genetic association studies. However, it is computationally intensive, especially for GWASs, and can be impractical if a large number of random shuffles are used to ensure accuracy. Many researchers have developed approximation algorithms to relieve the computing burden imposed by permutation. One particularly attractive alternative to permutation is to calculate the effective number of independent tests, Meff, which has been shown to be promising in genetic association studies. In this study, we compare recently developed Meff methods and validate them by the permutation test with 10,000 random shuffles using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K Array Set. Our results show that the simpleM method produces the best approximation of the permutation threshold, and it does so in the shortest amount of time. We also show that Meff is indeed valid on a genome‐wide scale in these data sets based on statistical theory and significance tests. The significance thresholds derived can provide practical guidelines for other studies using similar population samples and genotyping platforms. Genet. Epidemiol. 34:100–105, 2010. © 2009 Wiley‐Liss, Inc. 相似文献
5.
Genome‐wide association studies (GWAS) have been successful in finding numerous new risk variants for complex diseases, but the results almost exclusively rely on single‐marker scans. Methods that can analyze joint effects of many variants in GWAS data are still being developed and trialed. To evaluate the performance of such methods it is essential to have a GWAS data simulator that can rapidly simulate a large number of samples, and capture key features of real GWAS data such as linkage disequilibrium (LD) among single‐nucleotide polymorphisms (SNPs) and joint effects of multiple loci (multilocus epistasis). In the current study, we combine techniques for specifying high‐order epistasis among risk SNPs with an existing program GWAsimulator [Li and Li, 2008] to achieve rapid whole‐genome simulation with accurate modeling of complex interactions. We considered various approaches to specifying interaction models including the following: departure from product of marginal effects for pairwise interactions, product terms in logistic regression models for low‐order interactions, and penetrance tables conforming to marginal effect constraints for high‐order interactions or prescribing known biological interactions. Methods for conversion among different model specifications are developed using penetrance table as the fundamental characterization of disease models. The new program, called simGWA, is capable of efficiently generating large samples of GWAS data with high precision. We show that data simulated by simGWA are faithful to template LD structures, and conform to prespecified diseases models with (or without) interactions. 相似文献
6.
Xi Chen Lily Wang Bo Hu Mingsheng Guo John Barnard Xiaofeng Zhu 《Genetic epidemiology》2010,34(7):716-724
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome‐wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway‐based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within‐category selection to identify the most important SNPs within each gene set. The proposed model operates in a well‐established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case‐Control Consortium Crohn Disease (CD) data set. Genet. Epidemiol. 34: 716‐724, 2010. © 2010 Wiley‐Liss, Inc. 相似文献
7.
Beaty TH Ruczinski I Murray JC Marazita ML Munger RG Hetmanski JB Murray T Redett RJ Fallin MD Liang KY Wu T Patel PJ Jin SC Zhang TX Schwender H Wu-Chou YH Chen PK Chong SS Cheah F Yeow V Ye X Wang H Huang S Jabs EW Shi B Wilcox AJ Lie RT Jee SH Christensen K Doheny KF Pugh EW Ling H Scott AF 《Genetic epidemiology》2011,35(6):469-478
Nonsyndromic cleft palate (CP) is a common birth defect with a complex and heterogeneous etiology involving both genetic and environmental risk factors. We conducted a genome-wide association study (GWAS) using 550 case-parent trios, ascertained through a CP case collected in an international consortium. Family-based association tests of single nucleotide polymorphisms (SNP) and three common maternal exposures (maternal smoking, alcohol consumption, and multivitamin supplementation) were used in a combined 2 df test for gene (G) and gene-environment (G × E) interaction simultaneously, plus a separate 1 df test for G × E interaction alone. Conditional logistic regression models were used to estimate effects on risk to exposed and unexposed children. While no SNP achieved genome-wide significance when considered alone, markers in several genes attained or approached genome-wide significance when G × E interaction was included. Among these, MLLT3 and SMC2 on chromosome 9 showed multiple SNPs resulting in an increased risk if the mother consumed alcohol during the peri-conceptual period (3 months prior to conception through the first trimester). TBK1 on chr. 12 and ZNF236 on chr. 18 showed multiple SNPs associated with higher risk of CP in the presence of maternal smoking. Additional evidence of reduced risk due to G × E interaction in the presence of multivitamin supplementation was observed for SNPs in BAALC on chr. 8. These results emphasize the need to consider G × E interaction when searching for genes influencing risk to complex and heterogeneous disorders, such as nonsyndromic CP. 相似文献
8.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc. 相似文献
9.
10.
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene‐based tests have insufficient power even for moderately sized samples, pathway‐based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome‐wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region‐based rare variant association methods, although potentially applicable to pathway‐based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome‐based studies, we use simulated and real datasets to evaluate pathway‐based association tests. Our simulation strategy adopts a genome‐wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway‐based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway‐based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single‐marker tests without gene‐level collapsing (weighted Kolmogrov‐Smirnov [WKS]‐Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome‐sequencing data of the chronic obstructive pulmonary disease, and found that the WKS‐Variant method confirms associated genes previously published. 相似文献
11.
Schaid DJ Sinnwell JP Jenkins GD McDonnell SK Ingle JN Kubo M Goss PE Costantino JP Wickerham DL Weinshilboum RM 《Genetic epidemiology》2012,36(1):3-16
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. 相似文献
12.
A General Efficient and Flexible Approach for Genome‐Wide Association Analyses of Imputed Genotypes in Family‐Based Designs 下载免费PDF全文
Genotype imputation is a critical technique for following up genome‐wide association studies. Efficient methods are available for dealing with the probabilistic nature of imputed single nucleotide polymorphisms (SNPs) in population‐based designs, but not for family‐based studies. We have developed a new analytical approach (FBATdosage), using imputed allele dosage in the general framework of family‐based association tests to bridge this gap. Simulation studies showed that FBATdosage yielded highly consistent type I error rates, whatever the level of genotype uncertainty, and a much higher power than the best‐guess genotype approach. FBATdosage allows fast linkage and association testing of several million of imputed variants with binary or quantitative phenotypes in nuclear families of arbitrary size with arbitrary missing data for the parents. The application of this approach to a family‐based association study of leprosy susceptibility successfully refined the association signal at two candidate loci, C1orf141‐IL23R on chromosome 1 and RAB32‐C6orf103 on chromosome 6. 相似文献
13.
Weihua Guan Liming Liang Michael Boehnke Gonalo R. Abecasis 《Genetic epidemiology》2009,33(6):508-517
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
14.
The potential of genome-wide association analysis can only be realized when they have power to detect signals despite the detrimental effect of multiple testing on power. We develop a weighted multiple testing procedure that facilitates the input of prior information in the form of groupings of tests. For each group a weight is estimated from the observed test statistics within the group. Differentially weighting groups improves the power to detect signals in likely groupings. The advantage of the grouped-weighting concept, over fixed weights based on prior information, is that it often leads to an increase in power even if many of the groupings are not correlated with the signal. Being data dependent, the procedure is remarkably robust to poor choices in groupings. Power is typically improved if one (or more) of the groups clusters multiple tests with signals, yet little power is lost when the groupings are totally random. If there is no apparent signal in a group, relative to a group that appears to have several tests with signals, the former group will be down-weighted relative to the latter. If no groups show apparent signals, then the weights will be approximately equal. The only restriction on the procedure is that the number of groups be small, relative to the total number of tests performed. 相似文献
15.
Problems associated with insufficient power have haunted the analysis of genome‐wide association studies and are likely to be the main challenge for the analysis of next‐generation sequencing data. Ranking genes according to their strength of association with the investigated phenotype is one solution. To obtain rankings for genes, researchers can draw from a wide range of statistics summarizing the relationships between variants mapped to a gene and the phenotype. Hence, it is of interest to explore the performance of these statistics in the context of rankings. To this end, we conducted a simulation study (limited to genes of equal sizes) of three different summary statistics examining the ability to rank genes in a meaningful order. The weighted sum of squared marginal score test (Pan, 2009), RareCover algorithm (Bahtia et al., 2010) and the elastic net regularization (Zou and Hastie, 2005) were chosen, because they can handle common as well as rare variants. The test based on the score statistic outperformed both other methods in almost all investigated scenarios. It was the only measure to consistently detect genes with interacting causal variants. However, the RareCover algorithm proved better at identifying genes including causal variants with small effect sizes and low minor allele frequency than the weighted sum of squared marginal score test. The performance of the elastic net regularization was unimpressive for all but the simplest scenarios. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
16.
The interpretation of the results of large association studies encompassing much or all of the human genome faces the fundamental statistical problem that a correspondingly large number of single nucleotide polymorphisms markers will be spuriously flagged as significant. A common method of dealing with these false positives is to raise the significance level for the individual tests for association of each marker. Any such adjustment for multiple testing is ultimately based on a more or less precise estimate for the actual overall type I error probability. We estimate this probability for association tests for correlated markers and show that it depends in a nonlinear way on the significance level for the individual tests. This dependence of the effective number of tests is not taken into account by existing multiple-testing corrections, leading to widely overestimated results. We demonstrate a simple correction for multiple testing, which can easily be calculated from the pairwise correlation and gives far more realistic estimates for the effective number of tests than previous formulae. The calculation is considerably faster than with other methods and hence applicable on a genome-wide scale. The efficacy of our method is shown on a constructed example with highly correlated markers as well as on real data sets, including a full genome scan where a conservative estimate only 8% above the permutation estimate is obtained in about 1% of computation time. As the calculation is based on pairwise correlations between markers, it can be performed at the stage of study design using public databases. 相似文献
17.
Oriol Canela‐Xandri Antonio Julià Josep Lluís Gelpí Sara Marsal 《Genetic epidemiology》2012,36(7):710-716
The detection of gene‐gene interactions (i.e., epistasis) in the human genome is becoming decisive for the complete characterization of the genetic factors associated with complex binary traits. Despite the fact that many methods have been developed to address this challenging issue, their performance still remains insufficient. We will show how case and control groups store complementary information regarding interactions, and the use of this fundamental property in the design of a new, rapid, and highly powerful epistasis analysis method. Unlike previous approaches where statistical methods are tested over a very limited range of situations, we have performed an exhaustive evaluation of the power of our new method. To this end, we also propose a more comprehensive interpretation of epistasis in which genotype interactions may be of risk, protective, or neutral. In this extended view of genetic interactions, we demonstrate that our method has superior performance than existing approaches, thus, providing a highly powerful tool for the identification of gene‐gene interactions associated with binary traits. 相似文献
18.
Genome-wide association (GWA) studies have been extremely successful in identifying novel loci contributing effects to a wide range of complex human traits. However, despite this success, the joint marginal effects of these loci account for only a small proportion of the heritability of these traits. Interactions between variants in different loci are not typically modelled in traditional GWA analysis, but may account for some of the missing heritability in humans, as they do in other model organisms. One of the key challenges in performing gene-gene interaction studies is the computational burden of the analysis. We propose a two-stage interaction analysis strategy to address this challenge in the context of both quantitative traits and dichotomous phenotypes. We have performed simulations to demonstrate only a negligible loss in power of this two-stage strategy, while minimizing the computational burden. Application of this interaction strategy to GWA studies of T2D and obesity highlights potential novel signals of association, which warrant follow-up in larger cohorts. 相似文献
19.
Garner C 《Genetic epidemiology》2007,31(4):288-295
Genome-wide association studies are carried out to identify unknown genes for a complex trait. Polymorphisms showing the most statistically significant associations are reported and followed up in subsequent confirmatory studies. In addition to the test of association, the statistical analysis provides point estimates of the relationship between the genotype and phenotype at each polymorphism, typically an odds ratio in case-control association studies. The statistical significance of the test and the estimator of the odds ratio are completely correlated. Selecting the most extreme statistics is equivalent to selecting the most extreme odds ratios. The value of the estimator, given the value of the statistical significance depends on the standard error of the estimator and the power of the study. This report shows that when power is low, estimates of the odds ratio from a genome-wide association study, or any large-scale association study, will be upwardly biased. Genome-wide association studies are often underpowered given the low alpha levels required to declare statistical significance and the small individual genetic effects known to characterize complex traits. Factors such as low allele frequency, inadequate sample size and weak genetic effects contribute to large standard errors in the odds ratio estimates, low power and upwardly biased odds ratios. Studies that have high power to detect an association with the true odds ratio will have little or no bias, regardless of the statistical significance threshold. The results have implications for the interpretation of genome-wide association analysis and the planning of subsequent confirmatory stages. 相似文献
20.
Case‐control genome‐wide association (GWA) studies have facilitated the identification of susceptibility loci for many complex diseases; however, these studies are often not adequately powered to detect gene‐environment (G×E) and gene‐gene (G×G) interactions. Case‐only studies are more efficient than case‐control studies for detecting interactions and require no data on control subjects. In this article, we discuss the concept and utility of the case‐only genome‐wide interaction (COGWI) study, in which common genetic variants, measured genome‐wide, are screened for association with environmental exposures or genetic variants of interest. An observed G‐E (or G‐G) association, as measured by the case‐only odds ratio (OR), suggests interaction, but only if the interacting factors are unassociated in the population from which the cases were drawn. The case‐only OR is equivalent to the interaction risk ratio. In addition to risk‐related interactions, we discuss how the COGWI design can be used to efficiently detect G×G, G×E and pharmacogenetic interactions related to disease outcomes in the context of observational clinical studies or randomized clinical trials. Such studies can be conducted using only data on individuals experiencing an outcome of interest or individuals not experiencing the outcome of interest. Sharing data among GWA and COGWI studies of disease risk and outcome can further enhance efficiency. Sample size requirements for COGWI studies, as compared to case‐control GWA studies, are provided. In the current era of genome‐wide analyses, the COGWI design is an efficient and straightforward method for detecting G×G, G×E and pharmacogenetic interactions related to disease risk, prognosis and treatment response. Genet. Epidemiol. 34:7–15, 2010. © 2009 Wiley‐Liss, Inc. 相似文献