首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The genetic case-control association study of unrelated subjects is a leading method to identify single nucleotide polymorphisms (SNPs) and SNP haplotypes that modulate the risk of complex diseases. Association studies often genotype several SNPs in a number of candidate genes; we propose a two-stage approach to address the inherent statistical multiple comparisons problem. In the first stage, each gene's association with disease is summarized by a single p-value that controls a familywise error rate. In the second stage, summary p-values are adjusted for multiplicity using a false discovery rate (FDR) controlling procedure. For the first stage, we consider marginal and joint tests of SNPs and haplotypes within genes, and we construct an omnibus test that combines SNP and haplotype analysis. Simulation studies show that when disease susceptibility is conferred by a SNP, and all common SNPs in a gene are genotyped, marginal analysis of SNPs using the Simes test has similar or higher power than marginal or joint haplotype analysis. Conversely, haplotype analysis can be more powerful when disease susceptibility is conferred by a haplotype. The omnibus test tracks the more powerful of the two approaches, which is generally unknown. Multiple testing balances the desire for statistical power against the implicit costs of false positive results, which up to now appear to be common in the literature.  相似文献   

2.
A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of ‘tag SNPs’ for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T2 test) are more powerful. Following this logical progression, we wondered if single‐locus tests would prove generally more powerful than the regression‐based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, TB, or by permutation of case‐control status, TP; a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, TS; and the Hotelling T2 procedure, which we call TR. These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, TS has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10–20 SNPs per gene), power of TP is generally superior to that for the other procedures, including TR. Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of TP and TR are indistinguishable. Genet. Epidemiol. © 2005 Wiley‐Liss, Inc.  相似文献   

3.
A new multimarker test for family-based association studies   总被引:1,自引:0,他引:1  
  相似文献   

4.
Genome‐wide association studies (GWASs) commonly use marginal association tests for each single‐nucleotide polymorphism (SNP). Because these tests treat SNPs as independent, their power will be suboptimal for detecting SNPs hidden by linkage disequilibrium (LD). One way to improve power is to use a multiple regression model. However, the large number of SNPs preclude simultaneous fitting with multiple regression, and subset regression is infeasible because of an exorbitant number of candidate subsets. We therefore propose a new method for detecting hidden SNPs having significant yet weak marginal association in a multiple regression model. Our method begins by constructing a bidirected graph locally around each SNP that demonstrates a moderately sized marginal association signal, the focal SNPs. Vertexes correspond to SNPs, and adjacency between vertexes is defined by an LD measure. Subsequently, the method collects from each graph all shortest paths to the focal SNP. Finally, for each shortest path the method fits a multiple regression model to all the SNPs lying in the path and tests the significance of the regression coefficient corresponding to the terminal SNP in the path. Simulation studies show that the proposed method can detect susceptibility SNPs hidden by LD that go undetected with marginal association testing or with existing multivariate methods. When applied to real GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our method detected two groups of SNPs: one in a region containing the apolipoprotein E (APOE) gene, and another in a region close to the semaphorin 5A (SEMA5A) gene.  相似文献   

5.
Candidate gene association studies often utilize one single nucleotide polymorphism (SNP) for analysis, with an initial report typically not being replicated by subsequent studies. The failure to replicate may result from incomplete or poor identification of disease-related variants or haplotypes, possibly due to naive SNP selection. A method for identification of linkage disequilibrium (LD) groups and selection of SNPs that capture sufficient intra-genic genetic diversity is described. We assume all SNPs with minor allele frequency above a pre-determined frequency have been identified. Principal component analysis (PCA) is applied to evaluate multivariate SNP correlations to infer groups of SNPs in LD (LD-groups) and to establish an optimal set of group-tagging SNPs (gtSNPs) that provide the most comprehensive coverage of intra-genic diversity while minimizing the resources necessary to perform an informative association analysis. This PCA method differs from haplotype block (HB) and haplotype-tagging SNP (htSNP) methods, in that an LD-group of SNPs need not be a contiguous DNA fragment. Results of the PCA method compared well with existing htSNP methods while also providing advantages over those methods, including an indication of the optimal number of SNPs needed. Further, evaluation of the method over multiple replicates of simulated data indicated PCA to be a robust method for SNP selection. Our findings suggest that PCA may be a powerful tool for establishing an optimal SNP set that maximizes the amount of genetic variation captured for a candidate gene using a minimal number of SNPs.  相似文献   

6.
It has been proposed that using association analysis of single nucleotide polymorphism (SNP) markers in candidate genes may be more successful in identifying disease susceptibility genes for complex diseases. Finding all the SNPs within a candidate gene and genotyping a large case‐control cohort is a resource‐intensive process. As linkage disequilibrium extends across small regions of the genome, the expectation is that a few common anonymous SNPs will be sufficient to detect functional disease‐associated alleles. The aim of this investigation was to compare the ability of a number of family‐ and population‐based association methods to identify known susceptibility loci using the Genetic Analysis Workshop 12 simulated data set. As expected, case‐control methods were more likely to detect association with individual SNPs but family‐based haplotyping methods appeared better able to localize the position of functional polymorphism. © 2001 Wiley‐Liss, Inc.  相似文献   

7.
Genomewide association studies (GWAS) sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple single‐nucleotide polymorphisms (SNPs) simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow‐up studies. Current multi‐SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA‐dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights. It estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing an SNP prioritization that best identifies underlying true signals, we show the following: our method easily outperforms a single‐marker analysis; when additive‐only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive‐only effects; and when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation.  相似文献   

8.
In this article, we develop a powerful test for identifying single nucleotide polymorphism (SNP)-sets that are predictive of survival with data from genome-wide association studies. We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially nonlinear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum P-value-based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application.  相似文献   

9.
Understanding the genetic and metabolic bases of obesity is helpful in planning and developing health strategies. Therefore, the first family-based joint linkage and linkage disequilibrium study was conducted in Iranian pedigrees to assess the relationship between obesity and single-nucleotide polymorphisms (SNPs) located in the 16q12.2 region. In the present study, a total of 13,344 individuals were included, of whom 12,502 individuals were within 3,109 pedigrees and 842 were unrelated singletons. To investigate the relationship between obesity and genetic variants, a joint model of linkage and linkage disequilibrium was applied. Moreover, a sequence kernel association test (SKAT) was used to evaluate the association of the SNP set with body size and lipid profile measurements. The joint model showed that rs13334070, in the intron 4 of the RPGRIP1L gene, has a significant association with obesity. According to the 4-gamete rule, which is a procedure for constructing SNP sets by considering recombination occurrence between SNPs, this polymorphism has a high correlation with six nearby SNPs that make an SNP set. SKAT showed that this SNP set has a significant association with body size factors, but almost no association with most of the lipid profile measurements. In conclusion, from the result of this study, it might be reasonable to consider RPGRIP1L as an important gene whose variations could be associated with obesity risk factors.  相似文献   

10.
Here we summarize the contributions to Group 13 of the Genetic Analysis Workshop 15 held in St. Pete Beach, Florida, on November 12-14, 2006. The focus of this group was to identify candidate genes associated with rheumatoid arthritis or surrogate outcomes. The association methods proposed in this group were diverse, from better known approaches, such as logistic regression for single nucleotide polymorphism (SNP) analysis and haplotype sharing tests to methods less familiar to genetic epidemiologists, such as machine learning and visualization methods. The majority of papers analyzed Genetic Analysis Workshop 15 Problems 2 (rheumatoid arthritis data) and 3 (simulated data). The highlighted points of this group analyses were: (1) haplotype-based statistics can be more powerful than single SNP analysis for risk-locus localization; (2) considering linkage disequilibrium block structure in haplotype analysis may reduce the likelihood of false-positive results; and (3) visual representation of genetic models for continuous covariates may help identify SNPs associated with the underlying quantitative trait loci.  相似文献   

11.
Many family‐based tests of linkage disequilibrium are not valid when related nuclear families from larger pedigrees are used, or when independent nuclear families with multiple cases are used. The Pedigree Disequilibrium Test (PDT) proposed by Martin et al. [Am J Hum Genet 67:146–54, 2000] avoids these problems. This paper sketches an extension of the PDT that can account for measured covariates. Where the PDT is based on allele‐counting methods, this extension is based on conditional logistic regression. Versions of these statistics were used to test for association between disease and two known functional single nucleotide polymorphisms (SNPs) on gene 1 and gene 6 and one inert SNP on gene 7 in the first 25 replicates of the simulated population‐isolate data. The new method was also used to test for linkage disequilibrium after correcting for the effect of the environmental factor E1. The PDT and the conditional logistic extension had similar power to detect the functional SNPs (100% for gene 1, approximately 50% for gene 6) and appropriate type I error rates for the inert SNP. Correcting for E1 slightly increased power to detect the association between gene 6 and disease. © 2001 Wiley‐Liss, Inc.  相似文献   

12.
For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods.  相似文献   

13.
Emily M 《Statistics in medicine》2012,31(21):2359-2373
Epistasis is often cited as the biological mechanism carrying the missing heritability in genome‐wide association studies. However, there is a very few number of studies reported in the literature. The low power of existing statistical methods is a potential explanation. Statistical procedures are also mainly based on the statistical definition of epistasis that prevents from detecting SNP–SNP interactions that rely on some classes of epistatic models. In this paper, we propose a new statistic, called IndOR for independence‐based odds ratio, based on the biological definition of epistasis. We assume that epistasis modifies the dependency between the two causal SNPs, and we develop a Wald procedure to test such hypothesis. Our new statistic is compared with three statistical procedures in a large power study on simulated data sets. We use extensive simulations, based on 45 scenarios, to investigate the effect of three factors: the underlying disease model, the linkage disequilibrium, and the control‐to‐case ratio. We demonstrate that our new test has the ability to detect a wider range of epistatic models. Furthermore, our new statistical procedure is remarkably powerful when the two loci are linked and when the control‐to‐case ratio is higher than 1. The application of our new statistic on the Wellcome Trust Case Control Consortium data set on Crohn's disease enhances our results on simulated data. Our new test, IndOR, catches previously reported interaction with more power. Furthermore, a new combination of variant has been detected by our new test as significantly associated with Crohn's disease. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
Qin H  Zhu X 《Genetic epidemiology》2012,36(3):235-243
When dense markers are available, one can interrogate almost every common variant across the genome via imputation and single nucleotide polymorphism (SNP) test, which has become a routine in current genome-wide association studies (GWASs). As a complement, admixture mapping exploits the long-range linkage disequilibrium (LD) generated by admixture between genetically distinct ancestral populations. It is then questionable whether admixture mapping analysis is still necessary in detecting the disease associated variants in admixed populations. We argue that admixture mapping is able to reduce the burden of massive comparisons in GWASs; it therefore can be a powerful tool to locate the disease variants with substantial allele frequency differences between ancestral populations. In this report we studied a two-stage approach, where candidate regions are defined by conducting admixture mapping at stage 1, and single SNP association tests are followed at stage 2 within the candidate regions defined at stage 1. We first established the genome-wide significance levels corresponding to the criteria to define the candidate regions at stage 1 by simulations. We next compared the power of the two-stage approach with direct association analysis. Our simulations suggest that the two-stage approach can be more powerful than the standard genome-wide association analysis when the allele frequency difference of a causal variant in ancestral populations, is larger than 0.4. Our conclusion is consistent with a theoretical prediction by Risch and Tang ([2006] Am J Hum Genet 79:S254). Surprisingly, our study also suggests that power can be improved when we use less strict criteria to define the candidate regions at stage 1.  相似文献   

15.
Haplotype sharing analysis was used to investigate the association of affection status with single nucleotide polymorphism (SNP) haplotypes within candidate gene 1 in one sample each from the isolated and the general population of Genetic Analysis Workshop (GAW) 12 simulated data. Gene 1 has direct influence on affection and harbors more than 70 SNPs. Haplotype sharing analysis depends heavily on previous haplotype estimation. Using GENEHUNTER haplotypes, strong evidence was found for most SNPs in the isolated population sample, thus providing evidence for an involvement of this gene, but the maximum -log(10)(p) values for the haplotype sharing statistics (HSS) test statistic did not correspond to the location of the true variant in either population. In comparison, transmission disequilibrium test (TDT) analysis showed the strongest results at the disease-causing variant in both populations, and these were outstanding in the general population. In this example, TDT analysis appears to perform better than HSS in identifying the disease-causing variant, using SNPs within a candidate gene in an outbred population. Simulations showed that the performance of HSS is hampered by closely spaced SNPs in strong linkage disequilibrium with the functional variant and by ambiguous haplotypes.  相似文献   

16.
This article applies the recently proposed "stability selection" procedure of Meinshausen and Bühlmann to the problem of variable selection in genome-wide association. In particular, it explores whether stability selection can identify new regions of interest originally missed or can call into legitimate question regions originally flagged. Our analysis of the seven data sets of the Wellcome Trust Case-Control Consortium suggests that stability selection effectively controls the family-wise error rate but suffers a loss of power. The extensive correlation structure among SNP markers induced by linkage disequilibrium renders the procedure too conservative, causing it to miss regions known to be highly significant from simple marginal analyses. As a remedy one can aggregate nearby SNPs into groups and select groups rather than individual SNPs. The modified procedure can accurately identify the most important regions of genome-wide association, but in a simulation study it still offers less power than simpler and less computationally intensive methods of marginal association testing.  相似文献   

17.
Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene‐gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene‐gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium‐sensing receptor gene (CaSR), the chloride channel gene (CLCNKB) and the Na, K, 2Cl cotransporter gene (CLC12A1) that contributes to variation in diastolic blood pressure. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

18.
Molecular epidemiology of preterm delivery: methodology and challenges   总被引:6,自引:0,他引:6  
Preterm delivery (PTD) appears to be a complex trait determined by both genetic and environmental factors. Few studies have examined genetic influence on PTD. The overall goal of our study is to examine major candidate genes of PTD and to test gene–environment interactions. Our study includes 500 preterm trios, including 500 preterm babies and their parents and 500 maternal age-matched term controls. We will perform the transmission/disequilibrium test (TDT) on candidate genes thought to be important in each of the four biological pathways of PTD: (1) decidual chorioamionotic inflammation: interleukin 1 (IL-1), IL-6, and tumour necrosis factor (TNF); (2) maternal and fetal stress: corticotropin-releasing hormone (CRH); (3) uteroplacental vascular lesions: methylenetereahydrofolate reductase (MTHFR); and (4) susceptibility to environmental toxins: GSTM1, GSTT1, CYP1A1, CYP2D6, CYP2E1, NAT2, NQO1, ALDH2, and EPHX. We will also perform standard case-control analyses on the 500 preterm cases and 500 term controls to examine gene–environment interactions. The major environmental, nutritional and social factors as well as clinical variables known or suspected to be associated with PTD will be used to test for gene–environment interactions. This study integrates epidemiological and clinical data as well as genetic markers along major pathogenic pathways of PTD. The findings from this study should improve our understanding of genetic influences on PTD and gene–environment interactions.  相似文献   

19.
In the new era of large‐scale collaborative Genome Wide Association Studies (GWAS), population stratification has become a critical issue that must be addressed. In order to build upon the methods developed to control the confounding effect of a structured population, it is extremely important to visualize and quantify that effect. In this work, we develop methodology for single nucleotide polymorphism (SNP) selection and subsequent population stratification visualization based on deviation from Hardy‐Weinberg equilibrium in conjunction with non‐metric multidimensional scaling (MDS); a distance‐based multivariate technique. Through simulation, it is shown that SNP selection based on Hardy‐Weinberg disequilibrium (HWD) is robust against confounding linkage disequilibrium patterns that have been problematic in past studies and methods as well as producing a differentiated SNP set. Non‐metric MDS is shown to be a multivariate visualization tool preferable to principal components in conjunction with HWD SNP selection through theoretical and empirical study from HapMap samples. The proposed selection tool offers a simple and effective way to select appropriate substructure‐informative markers for use in exploring the effect that population stratification may have in association studies. Genet. Epidemiol. 33:488–496, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

20.
Tag SNP selection for association studies   总被引:6,自引:0,他引:6  
This report describes current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. We propose a quasi-empirical approach towards evaluating the power of large studies with this calculation based upon the SNP genotype and haplotype frequencies estimated in a haplotype discovery panel. In this calculation, each common SNP in turn is treated as a potential unmeasured causal variant and subjected to a correlation analysis using the remaining SNPs. We use a small portion of the HapMap ENCODE data (488 common SNPs genotyped over approximately a 500 kb region of chromosome 2) as an illustrative example of this approach towards power evaluation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号