共查询到20条相似文献,搜索用时 15 毫秒
1.
Clayton D 《Genetic epidemiology》2012,36(4):409-418
\"Complex\" diseases are, by definition, influenced by multiple causes, both genetic and environmental, and statistical work on the joint action of multiple risk factors has, for more than 40 years, been dominated by the generalized linear model (GLM). In genetics, models for dichotomous traits have traditionally been approached via the model of an underlying, normally distributed, liability. This corresponds to the GLM with binomial errors and a probit link function. Elsewhere in epidemiology, however, the logistic regression model, a GLM with logit link function, has been the tool of choice, largely because of its convenient properties in case-control studies. The choice of link function has usually been dictated by mathematical convenience, but it has some important implications in (a) the choice of association test statistic in the presence of existing strong risk factors, (b) the ability to predict disease from genotype given its heritability, and (c) the definition, and interpretation of epistasis (or epistacy). These issues are reviewed, and a new association test proposed. 相似文献
2.
Genome‐wide association studies (GWAS) have been successful in finding numerous new risk variants for complex diseases, but the results almost exclusively rely on single‐marker scans. Methods that can analyze joint effects of many variants in GWAS data are still being developed and trialed. To evaluate the performance of such methods it is essential to have a GWAS data simulator that can rapidly simulate a large number of samples, and capture key features of real GWAS data such as linkage disequilibrium (LD) among single‐nucleotide polymorphisms (SNPs) and joint effects of multiple loci (multilocus epistasis). In the current study, we combine techniques for specifying high‐order epistasis among risk SNPs with an existing program GWAsimulator [Li and Li, 2008] to achieve rapid whole‐genome simulation with accurate modeling of complex interactions. We considered various approaches to specifying interaction models including the following: departure from product of marginal effects for pairwise interactions, product terms in logistic regression models for low‐order interactions, and penetrance tables conforming to marginal effect constraints for high‐order interactions or prescribing known biological interactions. Methods for conversion among different model specifications are developed using penetrance table as the fundamental characterization of disease models. The new program, called simGWA, is capable of efficiently generating large samples of GWAS data with high precision. We show that data simulated by simGWA are faithful to template LD structures, and conform to prespecified diseases models with (or without) interactions. 相似文献
3.
Unraveling the nature of genetic interactions is crucial to obtaining a more complete picture of complex diseases. It is thought that gene-gene interactions play an important role in the etiology of cancer, cardiovascular, and immune-mediated disease. Interactions among genes are defined as phenotypic effects that differ from those observed for independent contributions of each gene, usually detected by univariate logistic regression methods. Using a multivariate extension of linkage disequilibrium (LD), we have developed a new method, based on distances between sample covariance matrices for groups of single nucleotide polymorphisms (SNPs), to test for interaction effects of two groups of genes associated with a disease phenotype. Since a disease-associated interacting locus will often be in LD with more than one marker in the region, a method that examines a set of markers in a region collectively can offer greater power than traditional methods. Our method effectively identifies interaction effects in simulated data, as well as in data on the genetic contributions to the risk for graft-versus-host disease following hematopoietic stem cell transplantation. 相似文献
4.
The explosion of genetic information over the last decade presents an analytical challenge for genetic association studies. As the number of genetic variables examined per individual increases, both variable selection and statistical modeling tasks must be performed during analysis. While these tasks could be performed separately, coupling them is necessary to select meaningful variables that effectively model the data. This challenge is heightened due to the complex nature of the phenotypes under study and the complex underlying genetic etiologies. To address this problem, a number of novel methods have been developed. In the current study, we compare the performance of six analytical approaches to detect both main effects and gene-gene interactions in a range of genetic models. Multifactor dimensionality reduction, grammatical evolution neural networks, random forests, focused interaction testing framework, step-wise logistic regression, and explicit logistic regression were compared. As one might expect, the relative success of each method is context dependent. This study demonstrates the strengths and weaknesses of each method and illustrates the importance of continued methods development. 相似文献
5.
Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for \"base learners\" and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P-value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single-marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi-locus methods in almost all cases. In an application to a large-scale case-control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi-locus effect (P-value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two-locus combinations showed significant two-locus interaction effects. 相似文献
6.
The nonlinear interaction effect among multiple genetic factors, i.e. epistasis, has been recognized as a key component in understanding the underlying genetic basis of complex human diseases and phenotypic traits. Due to the statistical and computational complexity, most epistasis studies are limited to interactions with an order of two. We developed ViSEN to analyze and visualize epistatic interactions of both two‐way and three‐way. ViSEN not only identifies strong interactions among pairs or trios of genetic attributes, but also provides a global interaction map that shows neighborhood and clustering structures. This visualized information could be very helpful to infer the underlying genetic architecture of complex diseases and to generate plausible hypotheses for further biological validations. ViSEN is implemented in Java and freely available at https://sourceforge.net/projects/visen/ . 相似文献
7.
Genetic association studies have been less successful than expected in detecting causal genetic variants, with frequent non-replication when such variants are claimed. Numerous possible reasons have been postulated, including inadequate sample size and possible unobserved stratification. Another possibility, and the focus of this paper, is that of epistasis, or gene-gene interaction. Although unlikely that we may glean information about disease mechanism, based purely upon the data, it may be possible to increase our power to detect an effect by allowing for epistasis within our test statistic. This paper derives an appropriate \"omnibus\" test for detecting causal loci whist allowing for numerous possible interactions and compares the power of such a test with that of the usual main effects test. This approach differs from that commonly used, for example by Marchini et al. [2005], in that it tests simultaneously for main effects and interactions, rather than interactions alone. The alternative hypothesis being tested by the \"omnibus\" test is whether a particular locus of interest has an effect on disease status, either marginally or epistatically and is therefore directly comparable to the main effects test at that locus. The paper begins by considering the direct case, in which the putative causal variants are observed and then extends these ideas to the indirect case in which the causal variants are unobserved and we have a set of tag single nucleotide polymorphisms (tag SNPs) representing the regions of interest. In passing, the derivation of the indirect omnibus test statistic leads to a novel \"indirect case-only test for interaction\". 相似文献
8.
Risch N 《Genetic epidemiology》1984,1(2):207-211
The workshop data were examined using a newly developed methodology (MILINK, Risch, 1984) for combined segregation, linkage, and association analysis of a complex disease trait in pedigree data. Results from problems two and three suggest that the method is powerful both for determining mode of disease inheritance and for resolution of linkage disequilibrium versus pleiotrophy (with epistasis) of marker alleles. 相似文献
9.
Wang K 《Genetic epidemiology》2008,32(7):606-614
A genetic variant is very likely to manifest its effect on disease through its main effect as well as through its interaction with other genetic variants or environmental factors. Power to detect genetic variants can be greatly improved by modeling their main effects and their interaction effects through a common set of parameters or \"generalized association parameters\" (Chatterjee et al. [2006] Am. J. Hum. Genet. 79:1002-1016) because of the reduced number of degrees of freedom. Following this idea, I propose two models that extend the work by Chatterjee and colleagues. Particularly, I consider not only the case of relatively weak interaction effect compared to the main effect but also the case of relatively weak main effect. This latter case is perhaps more relevant to genetic association studies. The proposed methods are invariant to the choice of the allele for scoring genotypes or the choice of the reference genotype score. For each model, the asymptotic distribution of the likelihood ratio statistic is derived. Simulation studies suggest that the proposed methods are more powerful than existing ones under certain circumstances. 相似文献
10.
Oriol Canela‐Xandri Antonio Julià Josep Lluís Gelpí Sara Marsal 《Genetic epidemiology》2012,36(7):710-716
The detection of gene‐gene interactions (i.e., epistasis) in the human genome is becoming decisive for the complete characterization of the genetic factors associated with complex binary traits. Despite the fact that many methods have been developed to address this challenging issue, their performance still remains insufficient. We will show how case and control groups store complementary information regarding interactions, and the use of this fundamental property in the design of a new, rapid, and highly powerful epistasis analysis method. Unlike previous approaches where statistical methods are tested over a very limited range of situations, we have performed an exhaustive evaluation of the power of our new method. To this end, we also propose a more comprehensive interpretation of epistasis in which genotype interactions may be of risk, protective, or neutral. In this extended view of genetic interactions, we demonstrate that our method has superior performance than existing approaches, thus, providing a highly powerful tool for the identification of gene‐gene interactions associated with binary traits. 相似文献
11.
In genetic mapping of complex traits, scored haplotypes are likely to represent only a subset of all causal polymorphisms. At the extreme of this scenario, observed polymorphisms are not themselves functional, and only linked to causal ones via linkage disequilibrium (LD). We will demonstrate that due to such incomplete knowledge regarding the underlying genetic mechanism, the variance of a trait may become different between the scored haplotypes. Thus, unequal variances between haplotypes may be indicative of additional functional polymorphisms affecting the trait. Methods accounting for such haplotype-specific variance may also provide an increased power to detect complex associations. We suggest ways to estimate and test these haplotypic variance contrasts, and incorporate them into the haplotypic tests for association. We further extend this approach to data with unknown gametic phase via likelihood-based simultaneous estimation of haplotypic effects and their frequencies. We find our approach to provide additional power, especially under the following types of models: (a) where scored and unobserved variants are epistatically interacting with each other; and (b) under heterogeneity models, where multiple unobserved mutations are linked to non-functional observed polymorphisms via LD. An illustrative example of usefulness of the method is discussed, utilizing analysis of multilocus effects within the catechol-O-methyltransferase gene. 相似文献
12.
A genome‐wide correlation analysis and cluster analysis were utilized to determine chromosomal regions that had similar nonparametric linkage scores across families in order to locate interacting susceptibility loci for asthma. Conditional analysis was performed to detect any increase in lod score over baseline. Eight of the strongest 5% of the correlations in the German and CSGA asthma data sets occurred in both data sets. The strongest positive correlations found in both data sets were between the 200 cM region on chromosome 2 with chromosome 12 at 90–120 cM (r = 0.26) and also with chromosome 6 at 40–70 cM (r = 0.24). While the cluster analysis did not find any regions that clustered across data sets, this method did detect clustering in regions that have been previously linked to asthma. © 2001 Wiley‐Liss, Inc. 相似文献
13.
Epistasis (gene‐gene interaction) detection in large‐scale genetic association studies has recently drawn extensive research interests as many complex traits are likely caused by the joint effect of multiple genetic factors. The large number of possible interactions poses both statistical and computational challenges. A variety of approaches have been developed to address the analytical challenges in epistatic interaction detection. These methods usually output the identified genetic interactions and store them in flat file formats. It is highly desirable to develop an effective visualization tool to further investigate the detected interactions and unravel hidden interaction patterns. We have developed EINVis, a novel visualization tool that is specifically designed to analyze and explore genetic interactions. EINVis displays interactions among genetic markers as a network. It utilizes a circular layout (specially, a tree ring view) to simultaneously visualize the hierarchical interactions between single nucleotide polymorphisms (SNPs), genes, and chromosomes, and the network structure formed by these interactions. Using EINVis, the user can distinguish marginal effects from interactions, track interactions involving more than two markers, visualize interactions at different levels, and detect proxy SNPs based on linkage disequilibrium. EINVis is an effective and user‐friendly free visualization tool for analyzing and exploring genetic interactions. It is publicly available with detailed documentation and online tutorial on the web at http://filer.case.edu/yxw407/einvis/ . 相似文献
14.
Schaid DJ McDonnell SK Carlson EE Thibodeau SN Stanford JL Ostrander EA 《Genetic epidemiology》2008,32(5):464-475
Recognizing that multiple genes are likely responsible for common complex traits, statistical methods are needed to rapidly screen for either interacting genes or locus heterogeneity in genetic linkage data. To achieve this, some investigators have proposed examining the correlation of pedigree linkage scores between pairs of chromosomal regions, because large positive correlations suggest interacting loci and large negative correlations suggest locus heterogeneity (Cox et al. [1999]; Maclean et al. [1993]). However, the statistical significance of these extreme correlations has been difficult to determine due to the autocorrelation of linkage scores along chromosomes. In this study, we provide novel solutions to this problem by using results from random field theory, combined with simulations to determine the null correlation for syntenic loci. Simulations illustrate that our new methods control the Type-I error rates, so that one can avoid the extremely conservative Bonferroni correction, as well as the extremely time-consuming permutational method to compute P-values for non-syntenic loci. Application of these methods to prostate cancer linkage studies illustrates interpretation of results and provides insights into the impact of marker information content on the resulting statistical correlations, and ultimately the asymptotic P-values. 相似文献
15.
Todd L. Edwards Eric Torstensen Scott Dudek Eden R. Martin Marylyn D. Ritchie 《Genetic epidemiology》2010,34(2):194-199
As genetic epidemiology looks beyond mapping single disease susceptibility loci, interest in detecting epistatic interactions between genes has grown. The dimensionality and comparisons required to search the epistatic space and the inference for a significant result pose challenges for testing epistatic disease models. The multifactor dimensionality reduction–pedigree disequilibrium test (MDR‐PDT) was developed to test for multilocus models in pedigree data. In the present study we rigorously tested MDR‐PDT with new cross‐validation (CV) (both 5‐ and 10‐fold) and omnibus model selection algorithms by simulating a range of heritabilities, odds ratios, minor allele frequencies, sample sizes, and numbers of interacting loci. Power was evaluated using 100, 500, and 1,000 families, with minor allele frequencies 0.2 and 0.4 and broad‐sense heritabilities of 0.005, 0.01, 0.03, 0.05, and 0.1 for 2‐ and 3‐locus purely epistatic penetrance models. We also compared the prediction error (PE) measure of effect with a predicted matched odds ratio (MOR) for final model selection and testing. We report that the CV procedure is valid with the permutation test, MDR‐PDT performs similarly with 5‐ and 10‐fold CV, and that the MOR is more powerful than PE as the fitness metric for MDR‐PDT. Genet. Epidemiol. 34: 194–199, 2010. © 2009 Wiley‐Liss, Inc. 相似文献
16.
Most findings from genome‐wide association studies (GWAS) are consistent with a simple disease model at a single nucleotide polymorphism, in which each additional copy of the risk allele increases risk by the same multiplicative factor, in contrast to dominance or interaction effects. As others have noted, departures from this multiplicative model are difficult to detect. Here, we seek to quantify this both analytically and empirically. We show that imperfect linkage disequilibrium (LD) between causal and marker loci distorts disease models, with the power to detect such departures dropping off very quickly: decaying as a function of r4, where r2 is the usual correlation between the causal and marker loci, in contrast to the well‐known result that power to detect a multiplicative effect decays as a function of r2. We perform a simulation study with empirical patterns of LD to assess how this disease model distortion is likely to impact GWAS results. Among loci where association is detected, we observe that there is reasonable power to detect substantial deviations from the multiplicative model, such as for dominant and recessive models. Thus, it is worth explicitly testing for such deviations routinely. Genet. Epidemiol. 35: 278‐290, 2011. © 2011 Wiley‐Liss, Inc. 相似文献
17.
The heritability of complex diseases including cancer is often attributed to multiple interacting genetic alterations. Such a non-linear, non-additive gene–gene interaction effect, that is, epistasis, renders univariable analysis methods ineffective for genome-wide association studies. In recent years, network science has seen increasing applications in modeling epistasis to characterize the complex relationships between a large number of genetic variations and the phenotypic outcome. In this study, by constructing a statistical epistasis network of colorectal cancer (CRC), we proposed to use multiple network measures to prioritize genes that influence the disease risk of CRC through synergistic interaction effects. We computed and analyzed several global and local properties of the large CRC epistasis network. We utilized topological properties of network vertices such as the edge strength, vertex centrality, and occurrence at different graphlets to identify genes that may be of potential biological relevance to CRC. We found 512 top-ranked single-nucleotide polymorphisms, among which COL22A1, RGS7, WWOX, and CELF2 were the four susceptibility genes prioritized by all described metrics as the most influential on CRC. 相似文献
18.
Emily M 《Statistics in medicine》2012,31(21):2359-2373
Epistasis is often cited as the biological mechanism carrying the missing heritability in genome‐wide association studies. However, there is a very few number of studies reported in the literature. The low power of existing statistical methods is a potential explanation. Statistical procedures are also mainly based on the statistical definition of epistasis that prevents from detecting SNP–SNP interactions that rely on some classes of epistatic models. In this paper, we propose a new statistic, called IndOR for independence‐based odds ratio, based on the biological definition of epistasis. We assume that epistasis modifies the dependency between the two causal SNPs, and we develop a Wald procedure to test such hypothesis. Our new statistic is compared with three statistical procedures in a large power study on simulated data sets. We use extensive simulations, based on 45 scenarios, to investigate the effect of three factors: the underlying disease model, the linkage disequilibrium, and the control‐to‐case ratio. We demonstrate that our new test has the ability to detect a wider range of epistatic models. Furthermore, our new statistical procedure is remarkably powerful when the two loci are linked and when the control‐to‐case ratio is higher than 1. The application of our new statistic on the Wellcome Trust Case Control Consortium data set on Crohn's disease enhances our results on simulated data. Our new test, IndOR, catches previously reported interaction with more power. Furthermore, a new combination of variant has been detected by our new test as significantly associated with Crohn's disease. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
19.
Molly A. Hall Shefali S. Verma John Wallace Anastasia Lucas Richard L. Berg John Connolly Dana C. Crawford David R. Crosslin Mariza de Andrade Kimberly F. Doheny Jonathan L. Haines John B. Harley Gail P. Jarvik Terrie Kitchner Helena Kuivaniemi Eric B. Larson David S. Carrell Gerard Tromp Tamara R. Vrabec Sarah A. Pendergrass Catherine A. McCarty Marylyn D. Ritchie 《Genetic epidemiology》2015,39(5):376-384
Bioinformatics approaches to examine gene‐gene models provide a means to discover interactions between multiple genes that underlie complex disease. Extensive computational demands and adjusting for multiple testing make uncovering genetic interactions a challenge. Here, we address these issues using our knowledge‐driven filtering method, Biofilter, to identify putative single nucleotide polymorphism (SNP) interaction models for cataract susceptibility, thereby reducing the number of models for analysis. Models were evaluated in 3,377 European Americans (1,185 controls, 2,192 cases) from the Marshfield Clinic, a study site of the Electronic Medical Records and Genomics (eMERGE) Network, using logistic regression. All statistically significant models from the Marshfield Clinic were then evaluated in an independent dataset of 4,311 individuals (742 controls, 3,569 cases), using independent samples from additional study sites in the eMERGE Network: Mayo Clinic, Group Health/University of Washington, Vanderbilt University Medical Center, and Geisinger Health System. Eighty‐three SNP‐SNP models replicated in the independent dataset at likelihood ratio test P < 0.05. Among the most significant replicating models was rs12597188 (intron of CDH1)–rs11564445 (intron of CTNNB1). These genes are known to be involved in processes that include: cell‐to‐cell adhesion signaling, cell‐cell junction organization, and cell‐cell communication. Further Biofilter analysis of all replicating models revealed a number of common functions among the genes harboring the 83 replicating SNP‐SNP models, which included signal transduction and PI3K‐Akt signaling pathway. These findings demonstrate the utility of Biofilter as a biology‐driven method, applicable for any genome‐wide association study dataset. 相似文献
20.
A topical question in genetic association studies is the optimal use of the information provided by genotyped single-nucleotide polymorphisms (SNPs) in order to detect the role of a candidate gene in a multifactorial disease. We propose a strategy called \"combination test\" that tests the association between a quantitative trait and all possible phased combinations of various numbers of SNPs. We compare this strategy to two alternative strategies: the association test that considers each SNP separately, and a multilocus genotype-based test that considers the phased combination of all SNPs together. To compare these three tests, a quantitative trait was simulated under different models of correspondence between phenotype and genotype, including the extreme case when two SNPs interact with no marginal effects of each SNP. The genotypes were taken from a sample of 290 independent individuals genotyped for three genes with various number of SNPs (from 5-8 SNPs). The results show that the \"combination test\" is the only one able to detect the association when the two SNPs involved in disease susceptibility interact with no marginal effects. Interestingly, even in the case of a single etiological SNP, the \"combination test\" performed well. We apply the three tests to Genetic Analysis Workshop 12 (Almasy et al. [2001] Genet. Epidemiol. 21:332-338) simulated data, and show that although there was no interactions between the etiological SNPs, the \"combination test\" was preferable to the two other compared methods to detect the role of the candidate gene. 相似文献