首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the emergence of Biobanks alongside large‐scale genome‐wide association studies (GWAS) we will soon be in the enviable situation of obtaining precise estimates of population allele frequencies for SNPs which make up the panels in standard genotyping arrays, such as those produced from Illumina and Affymetrix. For disease association studies it is well known that for rare diseases with known population minor allele frequencies (pMAFs) a case‐only design is most powerful. That is, for a fixed budget the optimal procedure is to genotype only cases (affecteds). In such tests experimenters look for a divergence from allele distribution in cases from that of the known population pMAF; in order to test the null hypothesis of no association between the disease status and the allele frequency. However, what has not been previously characterized is the utility of controls (known unaffecteds) when available. In this study we consider frequentist and Bayesian statistical methods for testing for SNP genotype association when population MAFs are known and when both cases and controls are available. We demonstrate that for rare diseases the most powerful frequentist design is, somewhat counterintuitively, to actively discard the controls even though they contain information on the association. In contrast we develop a Bayesian test which uses all available information (cases and controls) and appears to exhibit uniformaly greater power than all frequentist methods we considered. Genet. Epidemiol. 33:371–378, 2009. © 2009 Wiley Liss, Inc.  相似文献   

2.
Large-scale genome-wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large-scale data. Currently, the commonly used tests for GWAS include the Cochran-Armitage trend test, the allelic χ(2) test, the genotypic χ(2) test, the haplotypic χ(2) test, and the multi-marker genotypic χ(2) test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases.  相似文献   

3.
Although genome‐wide association studies (GWAS) have identified thousands of trait‐associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low‐frequency variants (minor allele frequency <5%), investigators can use region‐ or gene‐based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene‐based tests designed for association testing of low‐frequency variants on the X chromosome. Here we propose three gene‐based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT‐O). Using simulated case‐control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X‐inactivation. For case‐control studies, all three tests are reasonably well‐calibrated for all scenarios we evaluated. As expected, power for gene‐based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene‐based tests for X‐chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study.  相似文献   

4.
In a genome‐wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X‐chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X‐chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X‐chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X‐chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single‐marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X‐chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions.  相似文献   

5.
Whole genome association studies are generating data sets with hundreds of thousands of markers genotyped on thousands of cases and controls. We show that whole genome haplotypic association testing with permutation to account for multiple testing is statistically powerful and computationally feasible on such data, using an efficient software implementation of a recently proposed method. We use realistic simulations to explore the statistical properties of the method, and show that for ungenotyped disease-susceptibility variants with population frequencies of 5% or less the haplotypic tests have markedly better power than single-marker tests. We propose a combined single-marker and haplotypic strategy, in which both single-marker and haplotypic tests are applied, with the minimum P-value adjusted for multiple testing by permutation which results in a test that is powerful for detecting both low-and high-frequency disease-susceptibility variants.  相似文献   

6.
Detecting the association between a set of variants and a phenotype of interest is the first and important step in genetic and genomic studies. Although it attracted a large amount of attention in the scientific community and several related statistical approaches have been proposed in the literature, powerful and robust statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful and robust association test, which combines information from each individual single-nucleotide polymorphisms based on sequential independent burden tests. We compare the proposed approach with some popular tests through a comprehensive simulation study and real data application. Our results show that, in general, the new test is more powerful; the gain in detecting power can be substantial in many situations, compared to other methods.  相似文献   

7.
Genetic association is often determined in case-control studies by the differential distribution of alleles or genotypes. Recent work has demonstrated that association can also be assessed by deviations from the expected distributions of alleles or genotypes. Specifically, multiple methods motivated by the principles of Hardy-Weinberg equilibrium (HWE) have been developed. However, these methods do not take into account many of the assumptions of HWE. Therefore, we have developed a prevalence-based association test (PRAT) as an alternative method for detecting association in case-control studies. This method, also motivated by the principles of HWE, uses an estimated population allele frequency to generate expected genotype frequencies instead of using the case and control frequencies separately. Our method often has greater power, under a wide variety of genetic models, to detect association than genotypic, allelic or Cochran-Armitage trend association tests. Therefore, we propose PRAT as a powerful alternative method of testing for association.  相似文献   

8.
In current genome‐wide association studies (GWAS), the analysis is usually focused on autosomal variants only, and the sex chromosomes are often neglected. Recently, a number of technical hurdles have been described that add to a reluctance of including chromosome X in a GWAS, including complications in genotype calling, imputation, and selection of test statistics. To overcome this, we provide a “how to” guide for analyzing X chromosomal data within a standard GWAS. Following a general pipeline for GWAS, we highlight the steps in which the X chromosome requires specific attention, and we give tentative advice for each of these. Through this, we show that by selection of sensible algorithms and parameter settings, the inclusion of chromosome X in GWAS is manageable. Closing this gap is expected to further elucidate the genetic background of complex diseases, especially of those with sex‐specific features.  相似文献   

9.
Family‐based designs enriched with affected subjects and disease associated variants can increase statistical power for identifying functional rare variants. However, few rare variant analysis approaches are available for time‐to‐event traits in family designs and none of them applicable to the X chromosome. We developed novel pedigree‐based burden and kernel association tests for time‐to‐event outcomes with right censoring for pedigree data, referred to FamRATS (family‐based rare variant association tests for survival traits). Cox proportional hazard models were employed to relate a time‐to‐event trait with rare variants with flexibility to encompass all ranges and collapsing of multiple variants. In addition, the robustness of violating proportional hazard assumptions was investigated for the proposed and four current existing tests, including the conventional population‐based Cox proportional model and the burden, kernel, and sum of squares statistic (SSQ) tests for family data. The proposed tests can be applied to large‐scale whole‐genome sequencing data. They are appropriate for the practical use under a wide range of misspecified Cox models, as well as for population‐based, pedigree‐based, or hybrid designs. In our extensive simulation study and data example, we showed that the proposed kernel test is the most powerful and robust choice among the proposed burden test and the existing four rare variant survival association tests. When applied to the Diabetes Heart Study, the proposed tests found exome variants of the JAK1 gene on chromosome 1 showed the most significant association with age at onset of type 2 diabetes from the exome‐wide analysis.  相似文献   

10.
The large number of markers considered in a genome‐wide association study (GWAS) has resulted in a simplification of analyses conducted. Most studies are analyzed one marker at a time using simple tests like the trend test. Methods that account for the special features of genetic association studies, yet remain computationally feasible for genome‐wide analysis, are desirable as they may lead to increased power to detect associations. Haplotype sharing attempts to translate between population genetics and genetic epidemiology. Near a recent mutation that increases disease risk, haplotypes of case participants should be more similar to each other than haplotypes of control participants; conversely, the opposite pattern may be found near a recent mutation that lowers disease risk. We give computationally simple association tests based on haplotype sharing that can be easily applied to GWASs while allowing use of fast (but not likelihood‐based) haplotyping algorithms and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype‐sharing analyses that adjust for population stratification. Applying our methods to a GWAS of Parkinson's disease, we find a genome‐wide significant signal in the CAST gene that is not found by single‐SNP methods. Further, a missing‐data artifact that causes a spurious single‐SNP association on chromosome 9 does not impact our test. Genet. Epidemiol. 33:657–667, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

11.
We study the problem of testing for single marker‐multiple phenotype associations based on genome‐wide association study (GWAS) summary statistics without access to individual‐level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual‐level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta‐analyzed GWAS dataset with three blood lipid traits and another with sex‐stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta‐analyzed) genome‐wide summary statistics, then extend the method to meta‐analysis of multiple sets of genome‐wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.  相似文献   

12.
13.
Han F  Pan W 《Genetic epidemiology》2010,34(7):680-688
To detect genetic association with common and complex diseases, many statistical tests have been proposed for candidate gene or genome-wide association studies with the case-control design. Due to linkage disequilibrium (LD), multi-marker association tests can gain power over single-marker tests with a Bonferroni multiple testing adjustment. Among many existing multi-marker association tests, most target to detect only one of many possible aspects in distributional differences between the genotypes of cases and controls, such as allele frequency differences, while a few new ones aim to target two or three aspects, all of which can be implemented in logistic regression. In contrast to logistic regression, a genomic distance-based regression (GDBR) approach aims to detect some high-order genotypic differences between cases and controls. A recent study has confirmed the high power of GDBR tests. At this moment, the popular logistic regression and the emerging GDBR approaches are completely unrelated; for example, one has to choose between the two. In this article, we reformulate GDBR as logistic regression, opening a venue to constructing other powerful tests while overcoming some limitations of GDBR. For example, asymptotic distributions can replace time-consuming permutations for deriving P-values and covariates, including gene-gene interactions, can be easily incorporated. Importantly, this reformulation facilitates combining GDBR with other existing methods in a unified framework of logistic regression. In particular, we show that Fisher's P-value combining method can boost statistical power by incorporating information from allele frequencies, Hardy-Weinberg disequilibrium, LD patterns, and other higher-order interactions among multi-markers as captured by GDBR.  相似文献   

14.
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.  相似文献   

15.
Emily M 《Statistics in medicine》2012,31(21):2359-2373
Epistasis is often cited as the biological mechanism carrying the missing heritability in genome‐wide association studies. However, there is a very few number of studies reported in the literature. The low power of existing statistical methods is a potential explanation. Statistical procedures are also mainly based on the statistical definition of epistasis that prevents from detecting SNP–SNP interactions that rely on some classes of epistatic models. In this paper, we propose a new statistic, called IndOR for independence‐based odds ratio, based on the biological definition of epistasis. We assume that epistasis modifies the dependency between the two causal SNPs, and we develop a Wald procedure to test such hypothesis. Our new statistic is compared with three statistical procedures in a large power study on simulated data sets. We use extensive simulations, based on 45 scenarios, to investigate the effect of three factors: the underlying disease model, the linkage disequilibrium, and the control‐to‐case ratio. We demonstrate that our new test has the ability to detect a wider range of epistatic models. Furthermore, our new statistical procedure is remarkably powerful when the two loci are linked and when the control‐to‐case ratio is higher than 1. The application of our new statistic on the Wellcome Trust Case Control Consortium data set on Crohn's disease enhances our results on simulated data. Our new test, IndOR, catches previously reported interaction with more power. Furthermore, a new combination of variant has been detected by our new test as significantly associated with Crohn's disease. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

16.
During the last decade genome-wide association studies have proven to be a powerful approach to identifying disease-causing variants. However, for admixed populations, most current methods for association testing are based on the assumption that the effect of a genetic variant is the same regardless of its ancestry. This is a reasonable assumption for a causal variant but may not hold for the genetic variants that are tested in genome-wide association studies, which are usually not causal. The effects of noncausal genetic variants depend on how strongly their presence correlate with the presence of the causal variant, which may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect size is allowed to depend on the ancestry of a given allele. Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a substantial increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to help determine if a given genetic variant is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.  相似文献   

17.
Genome‐wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype‐phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping‐based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome‐wide significance thresholds for various analysis scenarios. Using whole‐genome sequence data, we simulated sequencing‐based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome‐wide significance at approximately P = 5 × 10?9, and studies of African samples should apply a more stringent genome‐wide significance threshold of P = 1 × 10?9. Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.  相似文献   

18.
The purpose of this work is the development of linear trend tests that allow for error (LTT ae), specifically incorporating double-sampling information on phenotypes and/or genotypes. We use a likelihood framework. Misclassification errors are estimated via double sampling. Unbiased estimates of penetrances and genotype frequencies are determined through application of the Expectation-Maximization algorithm. We perform simulation studies to evaluate false-positive rates for various genotype classification weights (recessive, dominant, additive). We compare simulated power between the LTT ae and its genotypic test equivalent, the LRT ae, in the presence of phenotype and genotype misclassification, to evaluate power gains of the LTT ae for multi-locus haplotype association with a dominant mode of inheritance. Finally, we apply LTT ae and a method without double-sample information (LTT std) to double-sampled phenotype data for an actual Alzheimer's disease (AD) case-control study with ApoE genotypes. Simulation results suggest that the LTT ae maintains correct false-positive rates in the presence of misclassification. For power simulations, the LTT ae method is at least as powerful as LRT ae method, with a maximum power gain of 0.42 over the LRT ae method for certain parameter settings. For AD data, LTT ae provides more significant evidence for association (permutation p=0.0522) than LTT std (permutation p=0.1684). This is due to observed phenotype misclassification. The LTT ae statistic enables researchers to apply linear trend tests to case-control genetic data, increasing power to detect association in the presence of misclassification. If the disease MOI is known, LTT ae methods are usually more powerful due to the fact that the statistic has fewer degrees of freedom.  相似文献   

19.
For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods.  相似文献   

20.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号