首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A major challenge in genome‐wide association studies (GWASs) is to derive the multiple testing threshold when hypothesis tests are conducted using a large number of single nucleotide polymorphisms. Permutation tests are considered the gold standard in multiple testing adjustment in genetic association studies. However, it is computationally intensive, especially for GWASs, and can be impractical if a large number of random shuffles are used to ensure accuracy. Many researchers have developed approximation algorithms to relieve the computing burden imposed by permutation. One particularly attractive alternative to permutation is to calculate the effective number of independent tests, Meff, which has been shown to be promising in genetic association studies. In this study, we compare recently developed Meff methods and validate them by the permutation test with 10,000 random shuffles using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K Array Set. Our results show that the simpleM method produces the best approximation of the permutation threshold, and it does so in the shortest amount of time. We also show that Meff is indeed valid on a genome‐wide scale in these data sets based on statistical theory and significance tests. The significance thresholds derived can provide practical guidelines for other studies using similar population samples and genotyping platforms. Genet. Epidemiol. 34:100–105, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
Whole genome association studies are generating data sets with hundreds of thousands of markers genotyped on thousands of cases and controls. We show that whole genome haplotypic association testing with permutation to account for multiple testing is statistically powerful and computationally feasible on such data, using an efficient software implementation of a recently proposed method. We use realistic simulations to explore the statistical properties of the method, and show that for ungenotyped disease-susceptibility variants with population frequencies of 5% or less the haplotypic tests have markedly better power than single-marker tests. We propose a combined single-marker and haplotypic strategy, in which both single-marker and haplotypic tests are applied, with the minimum P-value adjusted for multiple testing by permutation which results in a test that is powerful for detecting both low-and high-frequency disease-susceptibility variants.  相似文献   

3.
After genetic regions have been identified in genomewide association studies (GWAS), investigators often follow up with more targeted investigations of specific regions. These investigations typically are based on single nucleotide polymorphisms (SNPs) with dense coverage of a region. Methods are thus needed to test the hypothesis of any association in given genetic regions. Several approaches for combining P‐values obtained from testing individual SNP hypothesis tests are available. We recently proposed a sequential procedure for testing the global null hypothesis of no association in a region. When this global null hypothesis is rejected, this method provides a list of significant hypotheses and has weak control of the family‐wise error rate. In this paper, we devise a permutation‐based version of the test that accounts for correlations of tests based on SNPs in the same genetic region. Based on simulated data, the method has correct control of the type I error rate and higher or comparable power to other tests.  相似文献   

4.
Genome-wide association studies typically test large numbers of genetic variants in association with trait values. It is well known that linkage disequilibrium (LD) between nearby markers tends to introduce correlation among association tests. Failure to properly adjust for multiple comparisons can lead to false-positive results or missing true-positive signals. The Bonferroni correction is generally conservative in the presence of LD. The permutation procedure, although has been widely employed to adjust for correlated tests, is not applicable when related individuals are included in case-control samples. With related individuals, the dependence among relatives' genotypes can also contribute to the correlation between tests. We present a new method P(norm) to correct for multiple hypothesis testing in case-control association studies in which some individuals are related. The adjustment with P(norm) simultaneously accounts for two sources of correlations of the test statistics: (1) LD among genetic markers (2) dependence among genotypes across related individuals. Using simulated data based on the International HapMap Project, we demonstrate that it has better control of type I error and is more powerful than some of the recently developed methods. We apply the method to a genome-wide association study of alcoholism in the GAW 14 COGA data set and detect genome-wide significant association.  相似文献   

5.
Large-scale genome-wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large-scale data. Currently, the commonly used tests for GWAS include the Cochran-Armitage trend test, the allelic χ(2) test, the genotypic χ(2) test, the haplotypic χ(2) test, and the multi-marker genotypic χ(2) test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases.  相似文献   

6.
Meta-analysis has become a key component of well-designed genetic association studies due to the boost in statistical power achieved by combining results across multiple samples of individuals and the need to validate observed associations in independent studies. Meta-analyses of genetic association studies based on multiple SNPs and traits are subject to the same multiple testing issues as single-sample studies, but it is often difficult to adjust accurately for the multiple tests. Procedures such as Bonferroni may control the type-I error rate but will generally provide an overly harsh correction if SNPs or traits are correlated. Depending on study design, availability of individual-level data, and computational requirements, permutation testing may not be feasible in a meta-analysis framework. In this article, we present methods for adjusting for multiple correlated tests under several study designs commonly employed in meta-analyses of genetic association tests. Our methods are applicable to both prospective meta-analyses in which several samples of individuals are analyzed with the intent to combine results, and retrospective meta-analyses, in which results from published studies are combined, including situations in which (1) individual-level data are unavailable, and (2) different sets of SNPs are genotyped in different studies due to random missingness or two-stage design. We show through simulation that our methods accurately control the rate of type I error and achieve improved power over multiple testing adjustments that do not account for correlation between SNPs or traits.  相似文献   

7.
In many applications of linear mixed-effects models to longitudinal and multilevel data especially from medical studies, it is of interest to test for the need of random effects in the model. It is known that classical tests such as the likelihood ratio, Wald, and score tests are not suitable for testing random effects because they suffer from testing on the boundary of the parameter space. Instead, permutation and bootstrap tests as well as Bayesian tests, which do not rely on the asymptotic distributions, avoid issues with the boundary of the parameter space. In this paper, we first develop a permutation test based on the likelihood ratio test statistic, which can be easily used for testing multiple random effects and any subset of them in linear mixed-effects models. The proposed permutation test would be an extension to two existing permutation tests. We then aim to compare permutation tests and Bayesian tests for random effects to find out which test is more powerful under which situation. Nothing is known about this in the literature, although this is an important practical problem due to the usefulness of both methods in tackling the challenges with testing random effects. For this, we consider a Bayesian test developed using Bayes factors, where we also propose a new alternative computation for this Bayesian test to avoid some computational issue it encounters in testing multiple random effects. Extensive simulations and a real data analysis are used for evaluation of the proposed permutation test and its comparison with the Bayesian test. We find that both tests perform well, albeit the permutation test with the likelihood ratio statistic tends to provide a relatively higher power when testing multiple random effects.  相似文献   

8.
Genome-wide significance for dense SNP and resequencing data   总被引:12,自引:0,他引:12  
The problem of multiple testing is an important aspect of genome-wide association studies, and will become more important as marker densities increase. The problem has been tackled with permutation and false discovery rate procedures and with Bayes factors, but each approach faces difficulties that we briefly review. In the current context of multiple studies on different genotyping platforms, we argue for the use of truly genome-wide significance thresholds, based on all polymorphisms whether or not typed in the study. We approximate genome-wide significance thresholds in contemporary West African, East Asian and European populations by simulating sequence data, based on all polymorphisms as well as for a range of single nucleotide polymorphism (SNP) selection criteria. Overall we find that significance thresholds vary by a factor of >20 over the SNP selection criteria and statistical tests that we consider and can be highly dependent on sample size. We compare our results for sequence data to those derived by the HapMap Consortium and find notable differences which may be due to the small sample sizes used in the HapMap estimate.  相似文献   

9.
Most genome-wide association studies (GWAS) are restricted to one phenotype, even if multiple related or unrelated phenotypes are available. However, an integrated analysis of multiple phenotypes can provide insight into their shared genetic basis and may improve the power of association studies. We present a new method, called "phenotype set enrichment analysis" (PSEA), which uses ideas of gene set enrichment analysis for the investigation of phenotype sets. PSEA combines statistics of univariate phenotype analyses and tests by permutation. It does not only allow analyzing predefined phenotype sets, but also to identify new phenotype sets. Apart from the application to situations where phenotypes and genotypes are available for each person, the method was adjusted to the analysis of GWAS summary statistics. PSEA was applied to data from the population-based cohort KORA F4 (N = 1,814) using iron-related and blood count traits. By confirming associations previously found in large meta-analyses on these traits, PSEA was shown to be a reliable tool. Many of these associations were not detectable by GWAS on single phenotypes in KORA F4. Therefore, the results suggest that PSEA can be more powerful than a single phenotype GWAS for the identification of association with multiple phenotypes. PSEA is a valuable method for analysis of multiple phenotypes, which can help to understand phenotype networks. Its flexible design enables both the use of prior knowledge and the generation of new knowledge on connection of multiple phenotypes. A software program for PSEA based on GWAS results is available upon request.  相似文献   

10.
Genetic association is often determined in case-control studies by the differential distribution of alleles or genotypes. Recent work has demonstrated that association can also be assessed by deviations from the expected distributions of alleles or genotypes. Specifically, multiple methods motivated by the principles of Hardy-Weinberg equilibrium (HWE) have been developed. However, these methods do not take into account many of the assumptions of HWE. Therefore, we have developed a prevalence-based association test (PRAT) as an alternative method for detecting association in case-control studies. This method, also motivated by the principles of HWE, uses an estimated population allele frequency to generate expected genotype frequencies instead of using the case and control frequencies separately. Our method often has greater power, under a wide variety of genetic models, to detect association than genotypic, allelic or Cochran-Armitage trend association tests. Therefore, we propose PRAT as a powerful alternative method of testing for association.  相似文献   

11.
To date, thousands of genetic variants to be associated with numerous human traits and diseases have been identified by genome-wide association studies (GWASs). The GWASs focus on testing the association between single trait and genetic variants. However, the analysis of multiple traits and single nucleotide polymorphisms (SNPs) might reflect physiological process of complex diseases and the corresponding study is called pleiotropy association analysis. Modern day GWASs report only summary statistics instead of individual-level phenotype and genotype data to avoid logistical and privacy issues. Existing methods for combining multiple phenotypes GWAS summary statistics mainly focus on low-dimensional phenotypes while lose power in high-dimensional cases. To overcome this defect, we propose two kinds of truncated tests to combine multiple phenotypes summary statistics. Extensive simulations show that the proposed methods are robust and powerful when the dimension of the phenotypes is high and only part of the phenotypes are associated with the SNPs. We apply the proposed methods to blood cytokines data collected from Finnish population. Results show that the proposed tests can identify additional genetic markers that are missed by single trait analysis.  相似文献   

12.
Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade, a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker-based tests, that is, testing one single-nucleotide polymorphism (SNP) at a time. Generally, the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene-level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test genetic association for both continuous and binary traits through not only one study but also multiple studies, which would be helpful to overcome the limitation of existing methods that can only be applied to a specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel trait associated genes which can improve our understanding of the mechanisms involved in -cell function, glucose homeostasis, and lipids traits.  相似文献   

13.
Estimation of significance thresholds for genomewide association scans   总被引:5,自引:0,他引:5  
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.  相似文献   

14.
Genomewide association studies are an exciting strategy in genetics, recently becoming feasible and harvesting many novel genes linked to multiple phenotypes. Determining the significance of results in the face of testing a genomewide set of multiple hypotheses, most of which are producing noisy, null-distributed association signals, presents a challenge to the wide community of association researchers. Rather than each study engaging in independent evaluation of significance standards, we have undertaken the task of developing such standards for genomewide significance, based on data collected by the International Haplotype Map Consortium. We report an estimated testing burden of a million independent tests genomewide in Europeans, and twice that number in Africans. We further identify the sensitivity of the testing burden to the required significance level, with implications to staged design of association studies.  相似文献   

15.
We study the problem of testing for single marker‐multiple phenotype associations based on genome‐wide association study (GWAS) summary statistics without access to individual‐level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual‐level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta‐analyzed GWAS dataset with three blood lipid traits and another with sex‐stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta‐analyzed) genome‐wide summary statistics, then extend the method to meta‐analysis of multiple sets of genome‐wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.  相似文献   

16.
Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype‐level methods, e.g., MultiPhen (O'Reilly et al. [ 2012 ]), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele‐level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case‐control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression‐based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single‐nucleotide polymorphism (Distance‐based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype‐level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy‐Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE.  相似文献   

17.
The haplotype-sharing correlation (HSC) method for association analysis using family data is revisited by introducing a permutation procedure for estimating region-wise significance at each marker on a study segment. In simulation studies, the HSC method has a correct type 1 error rate in both unstructured and structured populations. The HSC signals on disease segments occur in the vicinity of a true disease locus on a restricted region without recombination hotspots. However, the peak signal may not pinpoint the true disease location in a small region with dense markers. The HSC method is shown to have higher power than single- and multilocus family-based association test (FBAT) methods when the true disease locus is unobserved among the study markers, and especially under conditions of weak linkage disequilibrium and multiple ancestral disease alleles. These simulation results suggest that the HSC method has the capacity to identify true disease-associated segments under allelic heterogeneity that go undetected by the FBAT method that compares allelic or haplotypic frequencies.  相似文献   

18.
Genome‐wide association studies (GWAS) have been widely used to identify genetic effects on complex diseases or traits. Most currently used methods are based on separate single‐nucleotide polymorphism (SNP) analyses. Because this approach requires correction for multiple testing to avoid excessive false‐positive results, it suffers from reduced power to detect weak genetic effects under limited sample size. To increase the power to detect multiple weak genetic factors and reduce false‐positive results caused by multiple tests and dependence among test statistics, a modified forward multiple regression (MFMR) approach is proposed. Simulation studies show that MFMR has higher power than the Bonferroni and false discovery rate procedures for detecting moderate and weak genetic effects, and MFMR retains an acceptable‐false positive rate even if causal SNPs are correlated with many SNPs due to population stratification or other unknown reasons. Genet. Epidemiol. 33:518–525, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
Linkage disequilibrium (LD) of genetic loci is routinely estimated and graphically illustrated in genetic association studies. It has been suggested that the information in LD is also useful for association mapping and genetic association can be detected by comparing LD patterns between cases and controls. Here, we extend this idea to analyze case‐parents data by comparing LD patterns between transmitted and nontransmitted genotypes. We provide the condition when contrasting LD is valid for testing gene‐gene interactions. A permutation procedure is given to assess statistical significance. One advantage of our proposed methods is that haplotype information is not required. Thus, the implementation of our methods is straightforward and the resulted tests are free from potential bias caused by assumptions made to estimate haplotypes in silico. Since our test statistics use pairwise LD measurements, they are less affected by missing data than many other multilocus methods. With simulated data, we demonstrate that examining LD patterns of case‐parents data is a useful multilocus association mapping strategy and it complements existing association mapping methods. The application of our methods to a Crohn's disease data set shows that our methods can detect multilocus association that might be missed by other association methods. Our permutation procedure can also be modified to allow multiple offspring from a family to be analyzed. Genet. Epidemiol. 2011. © 2011 Wiley‐Liss, Inc. 35: 487‐498, 2011  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号