首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.  相似文献   

2.
A central goal of human genetics is to identify susceptibility genes for common human diseases. An important challenge is modelling gene-gene interaction or epistasis that can result in nonadditivity of genetic effects. The multifactor dimensionality reduction (MDR) method was developed as a machine learning alternative to parametric logistic regression for detecting interactions in the absence of significant marginal effects. The goal of MDR is to reduce the dimensionality inherent in modelling combinations of polymorphisms using a computational approach called constructive induction. Here, we propose a Robust Multifactor Dimensionality Reduction (RMDR) method that performs constructive induction using a Fisher's Exact Test rather than a predetermined threshold. The advantage of this approach is that only statistically significant genotype combinations are considered in the MDR analysis. We use simulation studies to demonstrate that this approach will increase the success rate of MDR when there are only a few genotype combinations that are significantly associated with case-control status. We show that there is no loss of success rate when this is not the case. We then apply the RMDR method to the detection of gene-gene interactions in genotype data from a population-based study of bladder cancer in New Hampshire.  相似文献   

3.

Background  

There is a growing awareness that interaction between multiple genes play an important role in the risk of common, complex multi-factorial diseases. Many common diseases are affected by certain genotype combinations (associated with some genes and their interactions). The identification and characterization of these susceptibility genes and gene-gene interaction have been limited by small sample size and large number of potential interactions between genes. Several methods have been proposed to detect gene-gene interaction in a case control study. The penalized logistic regression (PLR), a variant of logistic regression with L 2 regularization, is a parametric approach to detect gene-gene interaction. On the other hand, the Multifactor Dimensionality Reduction (MDR) is a nonparametric and genetic model-free approach to detect genotype combinations associated with disease risk.  相似文献   

4.
Association studies that genotype affected offspring and their parents (triads) offer robustness to genetic population structure while enabling assessments of maternal effects, parent‐of‐origin effects, and gene‐by‐environment interaction. We propose case‐parents designs that use pooled DNA specimens to make economical use of limited available specimens. One can markedly reduce the number of genotyping assays required by randomly partitioning the case‐parent triads into pooling sets of h triads each and creating three pools from every pooling set, one pool each for mothers, fathers, and offspring. Maximum‐likelihood estimation of relative risk parameters proceeds via log‐linear modeling using the expectation‐maximization algorithm. The approach can assess offspring and maternal genetic effects and accommodate genotyping errors and missing genotypes. We compare the power of our proposed analysis for testing offspring and maternal genetic effects to that based on a difference approach and that of the gold standard based on individual genotypes, under a range of allele frequencies, missing parent proportions, and genotyping error rates. Power calculations show that the pooling strategies cause only modest reductions in power if genotyping errors are low, while reducing genotyping costs and conserving limited specimens.  相似文献   

5.
The search for susceptibility loci in gene–gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene–gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene–gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene–gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II.  相似文献   

6.
Quantitative trait loci (QTLs) are being used to study genetic networks, protein functions, and systems properties that underlie phenotypic variation and disease risk in humans, model organisms, agricultural species, and natural populations. The challenges are many, beginning with the seemingly simple tasks of mapping QTLs and identifying their underlying genetic determinants. Various specialized resources have been developed to study complex traits in many model organisms. In the mouse, remarkably different pictures of genetic architectures are emerging. Chromosome Substitution Strains (CSSs) reveal many QTLs, large phenotypic effects, pervasive epistasis, and readily identified genetic variants. In contrast, other resources as well as genome-wide association studies (GWAS) in humans and other species reveal genetic architectures dominated with a relatively modest number of QTLs that have small individual and combined phenotypic effects. These contrasting architectures are the result of intrinsic differences in the study designs underlying different resources. The CSSs examine context-dependent phenotypic effects independently among individual genotypes, whereas with GWAS and other mouse resources, the average effect of each QTL is assessed among many individuals with heterogeneous genetic backgrounds. We argue that variation of genetic architectures among individuals is as important as population averages. Each of these important resources has particular merits and specific applications for these individual and population perspectives. Collectively, these resources together with high-throughput genotyping, sequencing and genetic engineering technologies, and information repositories highlight the power of the mouse for genetic, functional, and systems studies of complex traits and disease models.  相似文献   

7.
The search for the missing heritability in genome-wide association studies (GWAS) has become an important focus for the human genetics community. One suspected location of these genetic effects is in gene-gene interactions, or epistasis. The computational burden of exploring gene-gene interactions in the wealth of data generated in GWAS, along with small to moderate sample sizes, have led to epistasis being an afterthought, rather than a primary focus of GWAS analyses. In this review, I discuss some potential approaches to filter a GWAS dataset to a smaller, more manageable dataset where searching for epistasis is considerably more feasible. I describe a number of alternative approaches, but primarily focus on the use of prior biological knowledge from databases in the public domain to guide the search for epistasis. The manner in which prior knowledge is incorporated into a GWA study can be many and these data can be extracted from a variety of database sources. I discuss a number of these approaches and propose that a comprehensive approach will likely be most fruitful for searching for epistasis in large-scale genomic studies of the current state-of-the-art and into the future.  相似文献   

8.
Aggressive periodontitis (AgP) is a multifactorial disease. The distinctive aspect of periodontitis is that this disease must deal with a large number of genes interacting with one another and forming complex networks. Thus, it is reasonable to expect that gene-gene interaction may have a crucial role. Therefore, we carried out a pilot case-control study to identify the association of candidate epistatic interactions between genetic risk factors and susceptibility to AgP, by using both conventional parametric analyses and a higher order interactions model, based on the nonparametric Multifactor Dimensionality Reduction algorithm. We analyzed 122 AgP patients and 246 appropriate periodontally healthy individuals, and genotyped 28 polymorphisms, located within 14 candidate genes, chosen among the principal genetic variants pointed out from literature and having a role in inflammation and immunity. Our analyses provided significant evidence for gene--gene interactions in the development of AgP, in particular, present results: (a) indicate a possible role of two new polymorphisms, within SEPS1 and TNFRSF1B genes, in determining host individual susceptibility to AgP; (b) confirm the potential association between of IL-6 and Fc γ- receptor polymorphisms and the disease; (c) exclude an essential contribution of IL-1 cluster gene polymorphisms to AgP in our Caucasian-Italian population.  相似文献   

9.
Undetected genotyping errors pose a problem in genetic epidemiological studies, as they may invalidate statistical analysis or reduce its power. Haplotype analysis requires an improved standard of the data, because a haplotype can be inferred correctly only if the genotypes of all its markers are correct. Here, we present a method that identifies probable genotyping errors in trio samples with the help of the estimated haplotype frequency distribution of the sample. If the likelihood of the most likely haplotype explanation depends strongly on just one genotype, in the sense that setting the genotype to be missing leads to a much more likely haplotype explanation, this genotype is considered as a potential genotyping error. We describe a method that systematically searches the whole data set for such potential errors. Based on the haplotype distribution of a real data set, we carry out a simulation study to estimate the sensitivity and specificity of the method. In addition, we apply our approach to the real data set itself. Potentially erroneous genotypes are re-determined via sequencing. The results of both the simulation study and of the application to the real data set show that a considerable proportion of true genotyping errors is detected and that the number of false-positive signals is acceptable. We conclude that it is indeed possible to identify probable genotyping errors by considering haplotypes. The method described here will be part of the next release of our FAMHAP software.  相似文献   

10.
AM Rose  LC Bell 《Immunology》2012,137(2):131-138
Autoimmune disorders are a complex and varied group of diseases that are caused by breakdown of self-tolerance. The aetiology of autoimmunity is multi-factorial, with both environmental triggers and genetically determined risk factors. In recent years, it has been increasingly recognized that genetic risk factors do not act in isolation, but rather the combination of individual additive effects, gene-gene interactions and gene-environment interactions determine overall risk of autoimmunity. The importance of gene-gene interactions, or epistasis, has been recently brought into focus, with research demonstrating that many autoimmune diseases, including rheumatic arthritis, autoimmune glomerulonephritis, systemic lupus erythematosus and multiple sclerosis, are influenced by epistatic interactions. This review sets out to examine the basic mechanisms of epistasis, how epistasis influences the immune system and the role of epistasis in two major autoimmune conditions, systemic lupus erythematosus and multiple sclerosis.  相似文献   

11.
The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene‐gene interactions. MDR relies on classification error in a cross‐validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data‐mining methods that use similar estimates.  相似文献   

12.
Multiple sclerosis (MS) is a common disease of the central nervous system characterized by inflammation, myelin loss, gliosis, varying degrees of axonal pathology, and progressive neurological dysfunction. Multiple sclerosis exhibits many of the characteristics that distinguish complex genetic disorders including polygenic inheritance and environmental exposure risks. Here, we used a highly efficient multilocus genotyping assay representing variation in 34 genes associated with inflammatory pathways to explore gene-gene interactions and disease susceptibility in a well-characterized African-American case-control MS data set. We applied the multifactor dimensionality reduction (MDR) test to detect epistasis, and identified single-IL4R(Q576R)- and three-IL4R(Q576R), IL5RA(-80), CD14(-260)- locus association models that predict MS risk with 75-76% accuracy (P<0.01). These results demonstrate the importance of exploring both main effects and gene-gene interactions in the study of complex diseases.  相似文献   

13.
Most common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree‐based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor‐dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two‐loci interaction (causal SNPs) and 98 ‘noise’ SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree‐based methods and LR are important statistical tools for the detection of unknown interactions among true risk‐associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations.  相似文献   

14.
In general, multiple issues are examined before the analysis of genetic data such as Hardy-Weinberg Equilibrium and Mendelian errors. Although missing genotypes are commonly observed in genetic studies, potential bias due to informative missingness is usually overlooked. Therefore, the Test of Informative Missingness (TIM) was the first attempt to determine whether or not parental genotypes are missing informatively. The TIM is a useful tool for genetic data cleaning. For example, excluding single-nucleotide polymorphisms that appear to be missing informatively may further improve the quality of genetic data. Although the TIM has decent power, its performance is discernibly weaker when the minor allele/genotype introduces informative missingness. In an effort to avoid such reduced power, the newly proposed strategy detects informative missingness by comparing inconsistent linkage disequilibrium signals between intact case-parent triads and incomplete data. Computer simulations revealed that the new method was robust to population stratifications and more powerful than the TIM in most situations. In addition, the new method demonstrated decent power in the genome-wide association study, even if the most conservative correction for multiple testing was adopted.  相似文献   

15.
Gene-gene interactions are proposed as an important component of the genetic architecture of complex diseases, and are just beginning to be evaluated in the context of genome-wide association studies (GWAS). In addition to detecting epistasis, a benefit to interaction analysis is that it also increases power to detect weak main effects. We conducted a knowledge-driven interaction analysis of a GWAS of 931 multiple sclerosis (MS) trios to discover gene-gene interactions within established biological contexts. We identify heterogeneous signals, including a gene-gene interaction between CHRM3 (muscarinic cholinergic receptor 3) and MYLK (myosin light-chain kinase) (joint P=0.0002), an interaction between two phospholipase C-β isoforms, PLCβ1 and PLCβ4 (joint P=0.0098), and a modest interaction between ACTN1 (actinin alpha 1) and MYH9 (myosin heavy chain 9) (joint P=0.0326), all localized to calcium-signaled cytoskeletal regulation. Furthermore, we discover a main effect (joint P=5.2E-5) previously unidentified by single-locus analysis within another related gene, SCIN (scinderin), a calcium-binding cytoskeleton regulatory protein. This work illustrates that knowledge-driven interaction analysis of GWAS data is a feasible approach to identify new genetic effects. The results of this study are among the first gene-gene interactions and non-immune susceptibility loci for MS. Further, the implicated genes cluster within inter-related biological mechanisms that suggest a neurodegenerative component to MS.  相似文献   

16.
Complex, quantitative traits are often the function of the coordinated action of many physically independent genetic factors. Interactive properties of multilocus genotypes, such as epistasis, are thought to be pervasive components of the genetic architecture of complex phenotypes. Here, we utilize a panel of interspecific backcross introgression lines to evaluate the genetic architecture of song variation, a quantitative sexual signaling phenotype, in the Hawaiian swordtail cricket genus Laupala. Allelic effects across five quantitative trait loci are consistent with a purely additive model of gene action, where alleles at multiple loci are found to have fully independent and discrete effects with respect to the sexual signaling phenotype. Whereas a more complex genetic architecture featuring non-additive dominance and epistasis components may constrain potential evolutionary trajectories and reduce the rate of evolutionary change, the polygenic, additive genetic architecture observed for sexual signaling in Laupala should respond rapidly to directional selection pressures and freely move throughout phenotypic space. This classic type I genetic architecture may facilitate the explosive radiation of song variation observed across the Laupala genus.  相似文献   

17.
Neuropeptide S receptor 1 (NPSR1, GPRA 154, GPRA) has been verified as a susceptibility gene for asthma and related phenotypes. The ligand for NPSR1, Neuropeptide S (NPS), activates signalling through NPSR1 and microarray analysis has identified Tenascin C (TNC) as a target gene of NPS-NPSR1 signalling. TNC has previously been implicated as a risk gene for asthma. We aimed therefore to study the genetic association of TNC in asthma- and allergy-related disorders as well as the biological and genetic interactions between NPSR1 and TNC. Regulation of TNC was investigated using NPS stimulated NPSR1 transfected cells. We genotyped 12 TNC SNPs in the cross-sectional PARSIFAL study (3113 children) and performed single SNP association, haplotype association and TNC and NPSR1 gene-gene interaction analyses. Our experimental results show NPS-dependent upregulation of TNC-mRNA. The genotyping results indicate single SNP and haplotype associations for several SNPs in TNC with the most significant association to rhinoconjunctivitis for a haplotype, with a frequency of 29% in cases (P = 0.0005). In asthma and atopic sensitization significant gene-gene interactions were found between TNC and NPSR1 SNPs, indicating that depending on the NPSR1 genotype, TNC can be associated with either an increased or a decreased risk of disease. We conclude that variations in TNC modifies, not only risk for asthma, but also for rhinoconjunctivitis. Furthermore, we show epistasis based on both a direct suggested regulatory effect and a genetic interaction between NPSR1 and TNC. These results suggest merging of previously independent pathways of importance in the development of asthma- and allergy-related traits.  相似文献   

18.
Gene-gene interactions have received much attention recently because most human traits may be under the control of several genetic factors, as well as environmental factors, and these factors likely interact among each other to influence these traits. Gauderman (2002) and Wang & Zhao (2003) have reported systematic studies on the statistical power to detect gene-gene interactions through association studies. In this article we investigated the power of the affected sib pair (ASP) design to detect gene-gene interaction at two disease loci. Different definitions of gene-gene interaction were considered and different disease models (including both logistic models considered in previous studies and several two-locus models with fixed penetrances) were examined. Our results indicate that comparisons between power to detect gene-gene interaction using ASP designs and association designs heavily depend on the definition of gene-gene interaction. Under the definition of gene-gene interaction with departure from independence between two marginal IBD sharings, the association design is much more powerful than the ASP design, and the additive model is more powerful than dominant and recessive models for rare diseases, while for common diseases for example with a population prevalence of 10%, the recessive model is more powerful than the additive and dominant models. Under the definitions of departure from a multiplicative model, additive model, and heterogeneity model ( Risch, 1990 ), the ASP design is as powerful as, or more powerful than, both family-based and population-based association designs for rare disease,, but less powerful for more common diseases. Under the definition of correlation between two marginal IBD sharings, the association design is much more powerful than the ASP design.  相似文献   

19.
In genetic studies, the transmission/disequilibrium test (TDT) using case-parent triads has gained popularity attributable to its robustness to population admixture. Several extensions have been proposed to accommodate incomplete triads. Some strategies assume that parental genotypes are missing completely at random (MCAR) to insure an unbiased conclusion and some methods allow parental genotypes to be missing informatively, resulting in reduced power when the missing data pattern is indeed MCAR. However, these tests assumed that offspring genotypes were MCAR. Recently, Guo indicated that when offspring genotypes were missing informatively, an occurrence that can be considered as ascertainment bias, inflated type-I error and/or reduced power may occur using the TDT when incomplete triads are excluded. In an effort to avoid an erroneous conclusion, we propose a strategy called testing informative missingness (TIM) that compares conditional distributions of parental genotypes among complete triads and incomplete data with only one parent to examine the missing data pattern. Through computer simulations, TIM has decent power to detect informative missingness and is robust to population admixture. In addition, we illustrate TIM with an application to the Framingham Heart Study.  相似文献   

20.
Genome-wide association studies have identified a large number of single-nucleotide polymorphisms (SNPs) that individually predispose to diseases. However, many genetic risk factors remain unaccounted for. Proteins coded by genes interact in the cell, and it is most likely that certain variants mainly affect the phenotype in combination with other variants, termed epistasis. An exhaustive search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips. In this study, the experimental knowledge on biological networks is used to narrow the search for two-locus epistasis. We provide evidence that this approach is computationally feasible and statistically powerful. By applying this method to the Wellcome Trust Case–Control Consortium data sets, we report four significant cases of epistasis between unlinked loci, in susceptibility to Crohn''s disease, bipolar disorder, hypertension and rheumatoid arthritis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号