首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
p‐Values from tests of significance can be combined using the ?idák correction (or the closely related Bonferroni correction) or Fisher's method, but both these methods require that the p‐values combined be independent when all null hypotheses tested are true. In this paper adjustments to these methods are proposed, using a new eigenvalue‐based measure of the effective number of independent tests to which the actual tests performed are equivalent, and are compared with adjustments proposed by previous authors. The adjusted methods are evaluated using a sample of 726 Alzheimer's disease (AD) cases and 707 group‐matched controls, genotyped at 84,975 single‐nucleotide polymorphism loci in 2,000 randomly chosen genes. The tests for genetic association with AD at loci within each gene are combined. The number of loci tested per gene varies from 2 to 994. The adjusted combined p‐values agree well with the significance of the combined p‐values determined empirically by random permutation of the data (?idák correction: r=0.990; Fisher's method: r=0.994). This indicates that the combined p‐values can be used to assess the relative strength of evidence for association of these genes with AD. The adjustment proposed here is a refinement of that of Nyholt ([2004] Am. J. Hum. Genet. 74:765–769), giving improved agreement with the results of random permutation. The improvement obtained is similar to that given by the refinement proposed by Li and Ji ([2005] Heredity 95:221–227). It is concluded that the concept of an effective number of tests is a valid approximation that allows p‐values to be combined in a highly informative way. Genet. Epidemiol. 33:559–568, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
Multiple testing corrections for imputed SNPs   总被引:1,自引:0,他引:1  
Gao X 《Genetic epidemiology》2011,35(3):154-158
Multiple testing corrections are an active research topic in genetic association studies, especially for genome-wide association studies (GWAS), where tests of association with traits are conducted at millions of imputed SNPs with estimated allelic dosages now. Failure to address multiple comparisons appropriately can introduce excess false-positive results and make subsequent studies following up those results inefficient. Permutation tests are considered the gold standard in multiple testing adjustment; however, this procedure is computationally demanding, especially for GWAS. Notably, the permutation thresholds for the huge number of estimated allelic dosages in real data sets have not been reported. Although many researchers have recently developed algorithms to rapidly approximate the permutation thresholds with accuracy similar to the permutation test, these methods have not been verified with estimated allelic dosages. In this study, we compare recently published multiple testing correction methods using 2.5M estimated allelic dosages. We also derive permutation significance levels based on 10,000 GWAS results under the null hypothesis of no association. Our results show that the simpleM method works well with estimated allelic dosages and gives the closest approximation to the permutation threshold while requiring the least computation time.  相似文献   

3.
The large number of markers considered in a genome‐wide association study (GWAS) has resulted in a simplification of analyses conducted. Most studies are analyzed one marker at a time using simple tests like the trend test. Methods that account for the special features of genetic association studies, yet remain computationally feasible for genome‐wide analysis, are desirable as they may lead to increased power to detect associations. Haplotype sharing attempts to translate between population genetics and genetic epidemiology. Near a recent mutation that increases disease risk, haplotypes of case participants should be more similar to each other than haplotypes of control participants; conversely, the opposite pattern may be found near a recent mutation that lowers disease risk. We give computationally simple association tests based on haplotype sharing that can be easily applied to GWASs while allowing use of fast (but not likelihood‐based) haplotyping algorithms and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype‐sharing analyses that adjust for population stratification. Applying our methods to a GWAS of Parkinson's disease, we find a genome‐wide significant signal in the CAST gene that is not found by single‐SNP methods. Further, a missing‐data artifact that causes a spurious single‐SNP association on chromosome 9 does not impact our test. Genet. Epidemiol. 33:657–667, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

4.
Chronic obstructive pulmonary disease (COPD) is a progressive disease with both environmental and genetic risk factors. Genome‐wide association studies (GWAS) have identified multiple genomic regions influencing risk of COPD. To thoroughly investigate the genetic etiology of COPD, however, it is also important to explore the role of copy number variants (CNVs) because the presence of structural variants can alter gene expression and can be causal for some diseases. Here, we investigated effects of polymorphic CNVs on quantitative measures of pulmonary function and chest computed tomography (CT) phenotypes among subjects enrolled in COPDGene, a multisite study. COPDGene subjects consist of roughly one‐third African American (AA) and two‐thirds non‐Hispanic white adult smokers (with or without COPD). We estimated CNVs using PennCNV on 9,076 COPDGene subjects using Illumina's Omni‐Express genome‐wide marker array. We tested for association between polymorphic CNV components (defined as disjoint intervals of copy number regions) for several quantitative phenotypes associated with COPD within each racial group. Among the AAs, we identified a polymorphic CNV on chromosome 5q35.2 located between two genes (FAM153B and SIMK1, but also harboring several pseudo‐genes) giving genome‐wide significance in tests of association with total lung capacity (TLCCT) as measured by chest CT scans. This is the first study of genome‐wide association tests of polymorphic CNVs and TLCCT. Although the ARIC cohort did not have the phenotype of TLCCT, we found similar counts of CNV deletions and amplifications among AA and European subjects in this second cohort.  相似文献   

5.
In the past decade, many genome‐wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case‐control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case‐control data for exploring genetic associations of some additional traits (secondary phenotypes, Y ) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non‐random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM‐PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM‐PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y , X, and D. Finally, we use POM‐PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case‐control sample from the population‐based Metabolic Syndrome in Men (METSIM) study. Only POM‐PS analysis of the T2D case‐control sample seems to provide valid association signals.  相似文献   

6.
There is an emerging interest in sequencing‐based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance‐component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM, is derived from a random‐effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non‐causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non‐causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
Genome‐wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype‐phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping‐based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome‐wide significance thresholds for various analysis scenarios. Using whole‐genome sequence data, we simulated sequencing‐based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome‐wide significance at approximately P = 5 × 10?9, and studies of African samples should apply a more stringent genome‐wide significance threshold of P = 1 × 10?9. Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.  相似文献   

8.
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

9.
Genome‐wide association studies (GWASs) commonly use marginal association tests for each single‐nucleotide polymorphism (SNP). Because these tests treat SNPs as independent, their power will be suboptimal for detecting SNPs hidden by linkage disequilibrium (LD). One way to improve power is to use a multiple regression model. However, the large number of SNPs preclude simultaneous fitting with multiple regression, and subset regression is infeasible because of an exorbitant number of candidate subsets. We therefore propose a new method for detecting hidden SNPs having significant yet weak marginal association in a multiple regression model. Our method begins by constructing a bidirected graph locally around each SNP that demonstrates a moderately sized marginal association signal, the focal SNPs. Vertexes correspond to SNPs, and adjacency between vertexes is defined by an LD measure. Subsequently, the method collects from each graph all shortest paths to the focal SNP. Finally, for each shortest path the method fits a multiple regression model to all the SNPs lying in the path and tests the significance of the regression coefficient corresponding to the terminal SNP in the path. Simulation studies show that the proposed method can detect susceptibility SNPs hidden by LD that go undetected with marginal association testing or with existing multivariate methods. When applied to real GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our method detected two groups of SNPs: one in a region containing the apolipoprotein E (APOE) gene, and another in a region close to the semaphorin 5A (SEMA5A) gene.  相似文献   

10.
11.
Glioblastoma multiforme (GBM), the most common type of malignant brain tumor, is highly fatal. Limited understanding of its rapid progression necessitates additional approaches that integrate what is known about the genomics of this cancer. Using a discovery set (n = 348) and a validation set (n = 174) of GBM patients, we performed genome‐wide analyses that integrated mRNA and micro‐RNA expression data from GBM as well as associated survival information, assessing coordinated variability in each as this reflects their known mechanistic functions. Cox proportional hazards models were used for the survival analyses, and nonparametric permutation tests were performed for the micro‐RNAs to investigate the association between the number of associated genes and its prognostication. We also utilized mediation analyses for micro‐RNA‐gene pairs to identify their mediation effects. Genome‐wide analyses revealed a novel pattern: micro‐RNAs related to more gene expressions are more likely to be associated with GBM survival (P = 4.8 × 10?5). Genome‐wide mediation analyses for the 32,660 micro‐RNA‐gene pairs with strong association (false discovery rate [FDR] < 0.01%) identified 51 validated pairs with significant mediation effect. Of the 51 pairs, miR‐223 had 16 mediation genes. These 16 mediation genes of miR‐223 were also highly associated with various other micro‐RNAs and mediated their prognostic effects as well. We further constructed a gene signature using the 16 genes, which was highly associated with GBM survival in both the discovery and validation sets (P = 9.8 × 10?6). This comprehensive study discovered mediation effects of micro‐RNA to gene expression and GBM survival and provided a new analytic framework for integrative genomics.  相似文献   

12.
A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of ‘tag SNPs’ for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T2 test) are more powerful. Following this logical progression, we wondered if single‐locus tests would prove generally more powerful than the regression‐based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, TB, or by permutation of case‐control status, TP; a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, TS; and the Hotelling T2 procedure, which we call TR. These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, TS has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10–20 SNPs per gene), power of TP is generally superior to that for the other procedures, including TR. Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of TP and TR are indistinguishable. Genet. Epidemiol. © 2005 Wiley‐Liss, Inc.  相似文献   

13.
14.
Diseases often cooccur in individuals more often than expected by chance, and may be explained by shared underlying genetic etiology. A common approach to genetic overlap analyses is to use summary genome‐wide association study data to identify single‐nucleotide polymorphisms (SNPs) that are associated with multiple traits at a selected P‐value threshold. However, P‐values do not account for differences in power, whereas Bayes’ factors (BFs) do, and may be approximated using summary statistics. We use simulation studies to compare the power of frequentist and Bayesian approaches with overlap analyses, and to decide on appropriate thresholds for comparison between the two methods. It is empirically illustrated that BFs have the advantage over P‐values of a decreasing type I error rate as study size increases for single‐disease associations. Consequently, the overlap analysis of traits from different‐sized studies encounters issues in fair P‐value threshold selection, whereas BFs are adjusted automatically. Extensive simulations show that Bayesian overlap analyses tend to have higher power than those that assess association strength with P‐values, particularly in low‐power scenarios. Calibration tables between BFs and P‐values are provided for a range of sample sizes, as well as an approximation approach for sample sizes that are not in the calibration table. Although P‐values are sometimes thought more intuitive, these tables assist in removing the opaqueness of Bayesian thresholds and may also be used in the selection of a BF threshold to meet a certain type I error rate. An application of our methods is used to identify variants associated with both obesity and osteoarthritis.  相似文献   

15.
We study the problem of testing for single marker‐multiple phenotype associations based on genome‐wide association study (GWAS) summary statistics without access to individual‐level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual‐level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta‐analyzed GWAS dataset with three blood lipid traits and another with sex‐stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta‐analyzed) genome‐wide summary statistics, then extend the method to meta‐analysis of multiple sets of genome‐wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.  相似文献   

16.
When analysing multicentre data, it may be of interest to test whether the distribution of the endpoint varies among centres. In a mixed‐effect model, testing for such a centre effect consists in testing to zero a random centre effect variance component. It has been shown that the usual asymptotic χ2 distribution of the likelihood ratio and score statistics under the null does not necessarily hold. In the case of censored data, mixed‐effects Cox models have been used to account for random effects, but few works have concentrated on testing to zero the variance component of the random effects. We propose a permutation test, using random permutation of the cluster indices, to test for a centre effect in multilevel censored data. Results from a simulation study indicate that the permutation tests have correct type I error rates, contrary to standard likelihood ratio tests, and are more powerful. The proposed tests are illustrated using data of a multicentre clinical trial of induction therapy in acute myeloid leukaemia patients. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Whole genome association studies are generating data sets with hundreds of thousands of markers genotyped on thousands of cases and controls. We show that whole genome haplotypic association testing with permutation to account for multiple testing is statistically powerful and computationally feasible on such data, using an efficient software implementation of a recently proposed method. We use realistic simulations to explore the statistical properties of the method, and show that for ungenotyped disease-susceptibility variants with population frequencies of 5% or less the haplotypic tests have markedly better power than single-marker tests. We propose a combined single-marker and haplotypic strategy, in which both single-marker and haplotypic tests are applied, with the minimum P-value adjusted for multiple testing by permutation which results in a test that is powerful for detecting both low-and high-frequency disease-susceptibility variants.  相似文献   

18.
Genetic heterogeneity, which may manifest on a population level as different frequencies of a specific disease susceptibility allele in different subsets of patients, is a common problem for candidate gene and genome‐wide association studies of complex human diseases. The ordered subset analysis (OSA) was originally developed as a method to reduce genetic heterogeneity in the context of family‐based linkage studies. Here, we have extended a previously proposed method (OSACC) for applying the OSA methodology to case‐control datasets. We have evaluated the type I error and power of different OSACC permutation tests with an extensive simulation study. Case‐control datasets were generated under two different models by which continuous clinical or environmental covariates may influence the relationship between susceptibility genotypes and disease risk. Our results demonstrate that OSACC is more powerful under some disease models than the commonly used trend test and a previously proposed joint test of main genetic and gene‐environment interaction effects. An additional unique benefit of OSACC is its ability to identify a more informative subset of cases that may be subjected to more detailed molecular analysis, such as DNA sequencing of selected genomic regions to detect functional variants in linkage disequilibrium with the associated polymorphism. The OSACC‐identified covariate threshold may also improve the power of an additional dataset to replicate previously reported associations that may only be detectable in a fraction of the original and replication datasets. In summary, we have demonstrated that OSACC is a useful method for improving SNP association signals in genetically heterogeneous datasets. Genet. Epidemiol. 34: 407–417, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

19.
A permutation test assigns a p‐value by conditioning on the data and treating the different possible treatment assignments as random. The fact that the conditional type I error rate given the data is controlled at level α ensures validity of the test even if certain adaptations are made. We show the connection between permutation and t‐tests, and use this connection to explain why certain adaptations are valid in a t‐test setting as well. We illustrate this with an example of blinded sample size recalculation. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

20.
There is a growing recognition that interactions (gene‐gene and gene‐environment) can play an important role in common disease etiology. The development of cost‐effective genotyping technologies has made genome‐wide association studies the preferred tool for searching for loci affecting disease risk. These studies are characterized by a large number of investigated SNPs, and efficient statistical methods are even more important than in classical association studies that are done with a small number of markers. In this article we propose a novel gene‐gene interaction test that is more powerful than classical methods. The increase in power is due to the fact that the proposed method incorporates reasonable constraints in the parameter space. The test for both association and interaction is based on a likelihood ratio statistic that has a x?2 distribution asymptotically. We also discuss the definitions used for “no interaction” and argue that tests for pure interaction are useful in genome‐wide studies, especially when using two‐stage strategies where the analyses in the second stage are done on pairs of loci for which at least one is associated with the trait. Genet. Epidemiol. 33:386–393, 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号