首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 488 毫秒
1.
Genome‐wide association studies (GWAS) have been widely used to identify genetic effects on complex diseases or traits. Most currently used methods are based on separate single‐nucleotide polymorphism (SNP) analyses. Because this approach requires correction for multiple testing to avoid excessive false‐positive results, it suffers from reduced power to detect weak genetic effects under limited sample size. To increase the power to detect multiple weak genetic factors and reduce false‐positive results caused by multiple tests and dependence among test statistics, a modified forward multiple regression (MFMR) approach is proposed. Simulation studies show that MFMR has higher power than the Bonferroni and false discovery rate procedures for detecting moderate and weak genetic effects, and MFMR retains an acceptable‐false positive rate even if causal SNPs are correlated with many SNPs due to population stratification or other unknown reasons. Genet. Epidemiol. 33:518–525, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
Genome-wide association studies typically test large numbers of genetic variants in association with trait values. It is well known that linkage disequilibrium (LD) between nearby markers tends to introduce correlation among association tests. Failure to properly adjust for multiple comparisons can lead to false-positive results or missing true-positive signals. The Bonferroni correction is generally conservative in the presence of LD. The permutation procedure, although has been widely employed to adjust for correlated tests, is not applicable when related individuals are included in case-control samples. With related individuals, the dependence among relatives' genotypes can also contribute to the correlation between tests. We present a new method P(norm) to correct for multiple hypothesis testing in case-control association studies in which some individuals are related. The adjustment with P(norm) simultaneously accounts for two sources of correlations of the test statistics: (1) LD among genetic markers (2) dependence among genotypes across related individuals. Using simulated data based on the International HapMap Project, we demonstrate that it has better control of type I error and is more powerful than some of the recently developed methods. We apply the method to a genome-wide association study of alcoholism in the GAW 14 COGA data set and detect genome-wide significant association.  相似文献   

3.
We investigate methods for testing gene‐disease outcome associations in situations where the genetic relationship potentially varies among subjects with differing environmental or clinical attributes. We propose a strategy which modestly increases multiple testing by evaluating weighted test statistics which focus (or enrich) association tests within subgroups and use a Monte‐Carlo method, based on simulating from the approximate large sample distribution of the statistics, to control type 1 error. We also introduce a stage‐wise calculated test statistic which allows more complex weighting on multiple environmental variables. Results from simulation studies confirm improved power of the proposed approaches compared to marginal testing in many situations. Genet. Epidemiol. 33:442–452, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score‐type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome‐wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta‐analysis of five major cardiovascular cohort studies identifies a new locus (HSCB) that is pleiotropic for the four traits analyzed.  相似文献   

5.
High‐throughput sequencing technologies have enabled large‐scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression‐based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray‐Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance‐component score test. Because most of the current microbiome studies have small sample sizes, a novel small‐sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html .  相似文献   

6.
Next‐generation sequencing technologies have afforded unprecedented characterization of low‐frequency and rare genetic variation. Due to low power for single‐variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel‐machine regression and adaptive testing methods for aggregative rare‐variant association testing have been demonstrated to be powerful approaches for pathway‐level analysis, although these methods tend to be computationally intensive at high‐variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare‐variant analysis using component gene‐level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family‐wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case‐control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open‐source R code for public use to facilitate easy application of our methods to existing rare‐variant analysis results.  相似文献   

7.
Kernel machine based association tests (KAT) have been increasingly used in testing the association between an outcome and a set of biological measurements due to its power to combine multiple weak signals of complex relationship with the outcome through the specification of a relevant kernel. Human genetic and microbiome association studies are two important applications of KAT. However, the classic KAT framework relies on large sample theory, and conservativeness has been observed for small sample studies, especially for microbiome association studies. The common approach for addressing the small sample problem relies on computationally intensive resampling methods. Here, we derive an exact test for KAT with continuous traits, which resolve the small sample conservatism of KAT without the need for resampling. The exact test has significantly improved power to detect association for microbiome studies. For binary traits, we propose a similar approximate test, and we show that the approximate test is very powerful for a wide range of kernels including common variant‐ and microbiome‐based kernels, and the approximate test controls the type I error well for these kernels. In contrast, the sequence kernel association tests have slightly inflated genomic inflation factors after small sample adjustment. Extensive simulations and application to a real microbiome association study are used to demonstrate the utility of our method.  相似文献   

8.
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next‐generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross‐phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare‐variant approaches exist for testing cross‐phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross‐phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome‐wide scale due to the use of a closed‐form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.  相似文献   

9.
In genetic association studies, multiple markers are usually employed to cover a genomic region of interest for localizing a trait locus. In this report, we propose a novel multi-marker family-based association test (T(LC)) that linearly combines the single-marker test statistics using data-driven weights. We examine the type-I error rate in a numerical study and compare its power to identify a common trait locus using tag single nucleotide polymorphisms (SNPs) within the same haplotype block that the trait locus resides with three competing tests including a global haplotype test (T(H)), a multi-marker test similar to the Hotelling-T(2) test for the population-based data (T(MM)), and a single-marker test with Bonferroni's correction for multiple testing (T(B)). The type-I error rate of T(LC) is well maintained in our numeric study. In all the scenarios we examined, T(LC) is the most powerful, followed by T(B). T(MM) and T(H) are the poorest. T(H) and T(MM) have essentially the same power when parents are available. However, when both parents are missing, T(MM) is substantially more powerful than T(H). We also apply this new test on a data set from a previous association study on nicotine dependence.  相似文献   

10.
Accurate genetic association studies are crucial for the detection and the validation of disease determinants. One of the main confounding factors that affect accuracy is population stratification, and great efforts have been extended for the past decade to detect and to adjust for it. We have now efficient solutions for population stratification adjustment for single‐SNP (where SNP is single‐nucleotide polymorphisms) inference in genome‐wide association studies, but it is unclear whether these solutions can be effectively applied to rare variation studies and in particular gene‐based (or set‐based) association methods that jointly analyze multiple rare and common variants. We examine here, both theoretically and empirically, the performance of two commonly used approaches for population stratification adjustment—genomic control and principal component analysis—when used on gene‐based association tests. We show that, different from single‐SNP inference, genes with diverse composition of rare and common variants may suffer from population stratification to various extent. The inflation in gene‐level statistics could be impacted by the number and the allele frequency spectrum of SNPs in the gene, and by the gene‐based testing method used in the analysis. As a consequence, using a universal inflation factor as a genomic control should be avoided in gene‐based inference with sequencing data. We also demonstrate that caution needs to be exercised when using principal component adjustment because the accuracy of the adjusted analyses depends on the underlying population substructure, on the way the principal components are constructed, and on the number of principal components used to recover the substructure.  相似文献   

11.
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT‐O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT‐O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT‐O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT‐O in the real data analysis. Our methods can be used in either gene‐disease genome‐wide/exome‐wide association studies or candidate gene analyses.  相似文献   

12.
In the field of gene set enrichment analysis (GSEA), meta‐analysis has been used to integrate information from multiple studies to present a reliable summarization of the expanding volume of individual biomedical research, as well as improve the power of detecting essential gene sets involved in complex human diseases. However, existing methods, Meta‐Analysis for Pathway Enrichment (MAPE), may be subject to power loss because of (1) using gross summary statistics for combining end results from component studies and (2) using enrichment scores whose distributions depend on the set sizes. In this paper, we adapt meta‐analysis approaches recently developed for genome‐wide association studies, which are based on fixed effect and random effects (RE) models, to integrate multiple GSEA studies. We further develop a mixed strategy via adaptive testing for choosing RE versus FE models to achieve greater statistical efficiency as well as flexibility. In addition, a size‐adjusted enrichment score based on a one‐sided Kolmogorov‐Smirnov statistic is proposed to formally account for varying set sizes when testing multiple gene sets. Our methods tend to have much better performance than the MAPE methods and can be applied to both discrete and continuous phenotypes. Specifically, the performance of the adaptive testing method seems to be the most stable in general situations.  相似文献   

13.
Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene‐based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well‐controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age‐related macular degeneration dataset was analyzed as an example.  相似文献   

14.
Hao K  Xu X  Laird N  Wang X  Xu X 《Genetic epidemiology》2004,26(1):22-30
At the current stage, a large number of single nucleotide polymorphisms (SNPs) have been deployed in searching for genes underlying complex diseases. A powerful method is desirable for efficient analysis of SNP data. Recently, a novel method for multiple SNP association test using a combination of allelic association (AA) and Hardy-Weinberg disequilibrium (HWD) has been proposed. However, the power of this test has not been systematically examined. In this study, we conducted a simulation study to further evaluate the statistical power of the new procedure, as well as of the influence of the HWD on its performance. The simulation examined the scenarios of multiple disease SNPs among a candidate pool, assuming different parameters including allele frequencies and risk ratios, dominant, additive, and recessive genetic models, and the existence of gene-gene interactions and linkage disequilibrium (LD). We also evaluated the performance of this test in capturing real disease associated SNPs, when a significant global P value is detected. Our results suggest that this new procedure is more powerful than conventional single-point analyses with correction of multiple testing. However, inclusion of HWD reduces the power under most circumstances. We applied the novel association test procedure to a case-control study of preterm delivery (PTD), examining the effects of 96 candidate gene SNPs concurrently, and detected a global P value of 0.0250 by using Cochran-Armitage chi(2)s as "starting" statistics in the procedure. In the following single point analysis, SNPs on IL1RN, IL1R2, ESR1, Factor 5, and OPRM1 genes were identified as possible risk factors in PTD.  相似文献   

15.
Which significance test is carried out when the number of repeats is small in microarray experiments can dramatically influence the results. When in two sample comparisons both conditions have fewer than, say, five repeats traditional test statistics require extreme results, before a gene is considered statistically significant differentially expressed after a multiple comparisons correction. In the literature many approaches to circumvent this problem have been proposed. Some of these proposals use (empirical) Bayes arguments to moderate the variance estimates for individual genes. Other proposals try to stabilize these variance estimate by combining groups of genes or similar experiments. In this paper we compare several of these approaches, both on data sets where both experimental conditions are the same, and thus few statistically significant differentially expressed genes should be identified, and on experiments where both conditions do differ. This allows us to identify which approaches are most powerful without identifying many false positives. We conclude that after balancing the numbers of false positives and true positives an empirical Bayes approach and an approach which combines experiments perform best. Standard t-tests are inferior and offer almost no power when the sample size is small.  相似文献   

16.
We study the problem of testing for single marker‐multiple phenotype associations based on genome‐wide association study (GWAS) summary statistics without access to individual‐level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual‐level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta‐analyzed GWAS dataset with three blood lipid traits and another with sex‐stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta‐analyzed) genome‐wide summary statistics, then extend the method to meta‐analysis of multiple sets of genome‐wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.  相似文献   

17.
By jointly analyzing multiple variants within a gene, instead of one at a time, gene‐based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster‐specific effects in a quadratic sum of squares and cross‐products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well‐powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P‐value, variance‐component, and principal‐component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene‐specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome‐wide analysis. The cluster construction of the MLC test statistics helps reveal within‐gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations.  相似文献   

18.
Pearson's chi-squared, the likelihood-ratio, and Fisher-Freeman-Halton's test statistics are often used to test the association of unordered r x c tables. Asymptotical, exact conditional, or exact conditional with mid-p adjustment methods are commonly used to compute the p-value. We have compared test power and significance level for these test statistics and p-value calculations in small sample r x c tables, mostly 3 x 2 and some with both r and c are greater than 2. After extensive simulations, in general we recommend using an exact conditional mid-p test with Pearson's chi-squared or Fisher-Freeman-Halton's statistic, which usually is the most powerful test yet preserve the approximate significance level. Moreover, we recommend that the asymptotic Pearson's chi-squared or other asymptotic tests not be used for small sample r x c tables.  相似文献   

19.
Z Li 《Statistics in medicine》2001,20(12):1843-1853
We develop a method for covariate adjustment for a general class of non-parametric tests for censored survival data which includes the widely used logrank and Wilcoxon tests. The covariate-adjusted tests improve the power of the unadjusted counterparts and have advantages over the covariate-adjusted Cox score test when there are outliers in the covariable space. We investigate the small sample properties of such test statistics through Monte Carlo simulations. Examples are given to illustrate the proposed procedures.  相似文献   

20.
Exhaustive testing of all possible SNP pairs in a genome‐wide association study (GWAS) generally yields low power to detect gene‐gene (G × G) interactions because of small effect sizes and stringent requirements for multiple‐testing correction. We introduce a new two‐step procedure for testing G × G interactions in case‐control GWAS to detect interacting single nucleotide polymorphisms (SNPs) regardless of their marginal effects. In an initial screening step, all SNP pairs are tested for gene‐gene association in the combined sample of cases and controls. In the second step, the pairs that pass the screening are followed up with a traditional test for G × G interaction. We show that the two‐step method is substantially more powerful to detect G × G interactions than the exhaustive testing approach. For example, with 2,000 cases and 2,000 controls, the two‐step method can have more than 90% power to detect an interaction odds ratio of 2.0 compared to less than 50% power for the exhaustive testing approach. Moreover, we show that a hybrid two‐step approach that combines our newly proposed two‐step test and the two‐step test that screens for marginal effects retains the best power properties of both. The two‐step procedures we introduce have the potential to uncover genetic signals that have not been previously identified in an initial single‐SNP GWAS. We demonstrate the computational feasibility of the two‐step G × G procedure by performing a G × G scan in the asthma GWAS of the University of Southern California Children's Health Study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号