首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 611 毫秒
1.
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR‐controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.  相似文献   

2.
Current analysis of event‐related potentials (ERP) data is usually based on the a priori selection of channels and time windows of interest for studying the differences between experimental conditions in the spatio‐temporal domain. In this work we put forward a new strategy designed for situations when there is not a priori information about ‘when’ and ‘where’ these differences appear in the spatio‐temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives. This issue is known as the problem of multiple comparisons and has been managed with methods that control the false discovery rate (FDR), such as permutation test and FDR methods. Although the former has been previously applied, to our knowledge, the FDR methods have not been introduced in the ERP data analysis. Here we compare the performance (on simulated and real data) of permutation test and two FDR methods (Benjamini and Hochberg (BH) and local‐fdr, by Efron). All these methods have been shown to be valid for dealing with the problem of multiple comparisons in the ERP analysis, avoiding the ad hoc selection of channels and/or time windows. FDR methods are a good alternative to the common and computationally more expensive permutation test. The BH method for independent tests gave the best overall performance regarding the balance between type I and type II errors. The local‐fdr method is preferable for high dimensional (multichannel) problems where most of the tests conform to the empirical null hypothesis. Differences among the methods according to assumptions, null distributions and dimensionality of the problem are also discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
We consider the problem of simultaneously testing multiple one-sided null hypotheses. Single-step procedures, such as the Bonferroni test, are characterized by the fact that the rejection or non-rejection of a null hypothesis does not take the decision for any other hypothesis into account. For stepwise test procedures, such as the Holm procedure, the rejection or non-rejection of a null hypothesis may depend on the decision of other hypotheses. It is well known that stepwise test procedures are by construction more powerful than their single-step counterparts. This power advantage, however, comes only at the cost of increased difficulties in constructing compatible simultaneous confidence intervals for the parameters of interest. For example, such simultaneous confidence intervals are easily obtained for the Bonferroni method, but surprisingly hard to derive for the Holm procedure. In this paper, we discuss the inherent problems and show that ad hoc solutions used in practice typically do not control the pre-specified simultaneous confidence level. Instead, we derive simultaneous confidence intervals that are compatible with a certain class of closed test procedures using weighted Bonferroni tests for each intersection hypothesis. The class of multiple test procedures covered in this paper includes gatekeeping procedures based on Bonferroni adjustments, fixed sequence procedures, the simple weighted or unweighted Bonferroni procedure by Holm and the fallback procedure. We illustrate the results with a numerical example.  相似文献   

4.
The multiplicity problem has become increasingly important in genetic studies as the capacity for high-throughput genotyping has increased. The control of False Discovery Rate (FDR) (Benjamini and Hochberg. [1995] J. R. Stat. Soc. Ser. B 57:289-300) has been adopted to address the problems of false positive control and low power inherent in high-volume genome-wide linkage and association studies. In many genetic studies, there is often a natural stratification of the m hypotheses to be tested. Given the FDR framework and the presence of such stratification, we investigate the performance of a stratified false discovery control approach (i.e. control or estimate FDR separately for each stratum) and compare it to the aggregated method (i.e. consider all hypotheses in a single stratum). Under the fixed rejection region framework (i.e. reject all hypotheses with unadjusted p-values less than a pre-specified level and then estimate FDR), we demonstrate that the aggregated FDR is a weighted average of the stratum-specific FDRs. Under the fixed FDR framework (i.e. reject as many hypotheses as possible and meanwhile control FDR at a pre-specified level), we specify a condition necessary for the expected total number of true positives under the stratified FDR method to be equal to or greater than that obtained from the aggregated FDR method. Application to a recent Genome-Wide Association (GWA) study by Maraganore et al. ([2005] Am. J. Hum. Genet. 77:685-693) illustrates the potential advantages of control or estimation of FDR by stratum. Our analyses also show that controlling FDR at a low rate, e.g. 5% or 10%, may not be feasible for some GWA studies.  相似文献   

5.
One of main roles of omics-based association studies with high-throughput technologies is to screen out relevant molecular features, such as genetic variants, genes, and proteins, from a large pool of such candidate features based on their associations with the phenotype of interest. Typically, screened features are subject to validation studies using more established or conventional assays, where the number of evaluable features is relatively limited, so that there may exist a fixed number of features measurable by these assays. Such a limitation necessitates narrowing a feature set down to a fixed size, following an initial screening analysis via multiple testing where adjustment for multiplicity is made. We propose a two-stage screening approach to control the false discovery rate (FDR) for a feature set with fixed size that is subject to validation studies, rather than for a feature set from the initial screening analysis. Out of the feature set selected in the first stage with a relaxed FDR level, a fraction of features with most statistical significance is firstly selected. For the remaining feature set, features are selected based on biological consideration only, without regard to any statistical information, which allows evaluating the FDR level for the finally selected feature set with fixed size. Improvement of the power is discussed in the proposed two-stage screening approach. Simulation experiments based on parametric models and real microarray datasets demonstrated substantial increment in the number of screened features for biological consideration compared with the standard screening approach, allowing for more extensive and in-depth biological investigations in omics association studies.  相似文献   

6.
While data sets based on dense genome scans are becoming increasingly common, there are many theoretical questions that remain unanswered. How can a large number of markers in high linkage disequilibrium (LD) and rare disease variants be simulated efficiently? How should markers in high LD be analyzed: individually or jointly? Are there fast and simple methods to adjust for correlation of tests? What is the power penalty for conservative Bonferroni adjustments? Assuming that association scans are adequately powered, we attempt to answer these questions. Performance of single‐point and multipoint tests, and their hybrids, is investigated using two simulation designs. The first simulation design uses theoretically derived LD patterns. The second design uses LD patterns based on real data. For the theoretical simulations we used polychoric correlation as a measure of LD to facilitate simulation of markers in LD and rare disease variants. Based on the simulation results of the two studies, we conclude that statistical tests assuming only additive genotype effects (i.e. Armitage and especially multipoint T2) should be used cautiously due to their suboptimal power in certain settings. A false discovery rate (FDR)‐adjusted combination of tests for additive, dominant and recessive effects had close to optimal power. However, the common genotypic χ2 test performed adequately and could be used in lieu of the FDR combination. While some hybrid methods yield (sometimes spectacularly) higher power they are computationally intensive. We also propose an “exact” method to adjust for multiple testing, which yields nominally higher power than the Bonferroni correction. Genet. Epidemiol. 2008. © 2008 Wiley‐Liss, Inc.  相似文献   

7.
Genetic association studies are popular for identifying genetic variants, such as single nucleotide polymorphisms (SNPs), that are associated with complex traits. Statistical tests are commonly performed one SNP at a time with an assumed mode of inheritance such as recessive, additive, or dominant genetic model. Such analysis can result in inadequate power when the employed model deviates from the underlying true genetic model. We propose an integrative association test procedure under a generalized linear model framework to flexibly model the data from the above three common genetic models and beyond. A computationally efficient resampling procedure is adopted to estimate the null distribution of the proposed test statistic. Simulation results show that our methods maintain the Type I error rate irrespective of the existence of confounding covariates and achieve adequate power compared to the methods with the true genetic model. The new methods are applied to two genetic studies on the resistance of severe malaria and sarcoidosis.  相似文献   

8.
Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect, and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants and maintain a lower false discovery rate than do several commonly used feature selection methods. It can handle both rare and common variants simultaneously. Applying our USR algorithm to DNA sequence data of Mexican Americans from GAW18, we replicated three hypertension pathways, demonstrating the effectiveness in identifying susceptibility genetic variants.  相似文献   

9.
Recent work on prospective power and sample size calculations for analyses of high‐dimension gene expression data that control the false discovery rate (FDR) focuses on the average power over all the truly nonnull hypotheses, or equivalently, the expected proportion of nonnull hypotheses rejected. Using another characterization of power, we adapt Efron's ([2007] Ann Stat 35:1351–1377) empirical Bayes approach to post hoc power calculation to develop a method for prospective calculation of the “identification power” for individual genes. This is the probability that a gene with a given true degree of association with clinical outcome or state will be included in a set within which the FDR is controlled at a specified level. An example calculation using proportional hazards regression highlights the effects of large numbers of genes with little or no association on the identification power for individual genes with substantial association.  相似文献   

10.
When simultaneously testing multiple hypotheses, the usual approach in the context of confirmatory clinical trials is to control the familywise error rate (FWER), which bounds the probability of making at least one false rejection. In many trial settings, these hypotheses will additionally have a hierarchical structure that reflects the relative importance and links between different clinical objectives. The graphical approach of Bretz et al (2009) is a flexible and easily communicable way of controlling the FWER while respecting complex trial objectives and multiple structured hypotheses. However, the FWER can be a very stringent criterion that leads to procedures with low power, and may not be appropriate in exploratory trial settings. This motivates controlling generalized error rates, particularly when the number of hypotheses tested is no longer small. We consider the generalized familywise error rate (k-FWER), which is the probability of making k or more false rejections, as well as the tail probability of the false discovery proportion (FDP), which is the probability that the proportion of false rejections is greater than some threshold. We also consider asymptotic control of the false discovery rate, which is the expectation of the FDP. In this article, we show how to control these generalized error rates when using the graphical approach and its extensions. We demonstrate the utility of the resulting graphical procedures on three clinical trial case studies.  相似文献   

11.
The original definitions of false discovery rate (FDR) and false non-discovery rate (FNR) can be understood as the frequentist risks of false rejections and false non-rejections, respectively, conditional on the unknown parameter, while the Bayesian posterior FDR and posterior FNR are conditioned on the data. From a Bayesian point of view, it seems natural to take into account the uncertainties in both the parameter and the data. In this spirit, we propose averaging out the frequentist risks of false rejections and false non-rejections with respect to some prior distribution of the parameters to obtain the average FDR (AFDR) and average FNR (AFNR), respectively. A linear combination of the AFDR and AFNR, called the average Bayes error rate (ABER), is considered as an overall risk. Some useful formulas for the AFDR, AFNR and ABER are developed for normal samples with hierarchical mixture priors. The idea of finding threshold values by minimizing the ABER or controlling the AFDR is illustrated using a gene expression data set. Simulation studies show that the proposed approaches are more powerful and robust than the widely used FDR method.  相似文献   

12.
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross‐validated log‐likelihood (max‐cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness‐of‐fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one‐standard‐error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

13.
When a large number of hypotheses are investigated, we propose multi-stage designs where in each interim analysis promising hypotheses are screened, which are investigated in further stages. Given a fixed overall number of observations, this allows one to spend more observations for promising hypotheses than with single-stage designs, where the observations are equally distributed among all considered hypotheses. We propose multi-stage procedures controlling either the family-wise error rate (FWER) or the false discovery rate (FDR) and derive asymptotically optimal stopping boundaries and sample size allocations (across stages) to maximize the power of the procedure. Optimized two-stage designs lead to a considerable increase in power compared with the classical single-stage design. Going from two to three stages additionally leads to a distinctive increase in power. Adding a fourth stage leads to a further improvement, which is, however, less pronounced. Surprisingly, we found only small differences in power between optimized integrated designs, where the data of all stages are used in the final test statistics, and optimized pilot designs where only the data from the final stage are used for testing. However, the integrated design controlling the FDR appeared to be more robust against misspecifications in the planning phase. Additionally, we found that with increasing number of stages the drop in power when controlling the FWER instead of the FDR becomes negligible. Our investigations show that the crucial point is not the choice of the error rate or the type of design, but the sequential nature of the trial where non-promising hypotheses are dropped in the early phases of the experiment.  相似文献   

14.
There are numerous alternatives to the so-called Bonferroni adjustment to control for familywise Type I error among multiple tests. Yet, for the most part, these approaches disregard the correlation among endpoints. This can prove to be a conservative hypothesis testing strategy if the null hypothesis is false. The James procedure was proposed to account for the correlation structure among multiple continuous endpoints. Here, a simulation study evaluates the statistical power of the Hochberg and James adjustment strategies relative to that of the Bonferroni approach when used for multiple correlated binary variables. The simulations demonstrate that relative to the Bonferroni approach, neither alternative sacrifices power. The Hochberg approach has more statistical power for rho相似文献   

15.
As whole-exome/genome sequencing data become increasingly available in genetic epidemiology research consortia, there is emerging interest in testing the interactions between rare genetic variants and environmental exposures that modify the risk of complex diseases. However, testing rare-variant–based gene-by-environment interactions (GxE) is more challenging than testing the genetic main effects due to the difficulty in correctly estimating the latter under the null hypothesis of no GxE effects and the presence of neutral variants. In response, we have developed a family of powerful and data-adaptive GxE tests, called “aGE” tests, in the framework of the adaptive powered score test, originally proposed for testing the genetic main effects. Using extensive simulations, we show that aGE tests can control the type I error rate in the presence of a large number of neutral variants or a nonlinear environmental main effect, and the power is more resilient to the inclusion of neutral variants than that of existing methods. We demonstrate the performance of the proposed aGE tests using Pancreatic Cancer Case-Control Consortium Exome Chip data. An R package “aGE” is available at http://github.com/ytzhong/projects/ .  相似文献   

16.
Recognizing that multiple genes are likely responsible for common complex traits, statistical methods are needed to rapidly screen for either interacting genes or locus heterogeneity in genetic linkage data. To achieve this, some investigators have proposed examining the correlation of pedigree linkage scores between pairs of chromosomal regions, because large positive correlations suggest interacting loci and large negative correlations suggest locus heterogeneity (Cox et al. [1999]; Maclean et al. [1993]). However, the statistical significance of these extreme correlations has been difficult to determine due to the autocorrelation of linkage scores along chromosomes. In this study, we provide novel solutions to this problem by using results from random field theory, combined with simulations to determine the null correlation for syntenic loci. Simulations illustrate that our new methods control the Type-I error rates, so that one can avoid the extremely conservative Bonferroni correction, as well as the extremely time-consuming permutational method to compute P-values for non-syntenic loci. Application of these methods to prostate cancer linkage studies illustrates interpretation of results and provides insights into the impact of marker information content on the resulting statistical correlations, and ultimately the asymptotic P-values.  相似文献   

17.
Genome‐wide association studies (GWAS) have been widely used to identify genetic effects on complex diseases or traits. Most currently used methods are based on separate single‐nucleotide polymorphism (SNP) analyses. Because this approach requires correction for multiple testing to avoid excessive false‐positive results, it suffers from reduced power to detect weak genetic effects under limited sample size. To increase the power to detect multiple weak genetic factors and reduce false‐positive results caused by multiple tests and dependence among test statistics, a modified forward multiple regression (MFMR) approach is proposed. Simulation studies show that MFMR has higher power than the Bonferroni and false discovery rate procedures for detecting moderate and weak genetic effects, and MFMR retains an acceptable‐false positive rate even if causal SNPs are correlated with many SNPs due to population stratification or other unknown reasons. Genet. Epidemiol. 33:518–525, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Along with the accumulated data of genetic variants and biomedical phenotypes in the genome era, statistical identification of pleiotropy is of growing interest for dissecting and understanding genetic correlations between complex traits. We proposed a novel method for estimating and testing pleiotropic effect of a genetic variant on two quantitative traits. Based on a covariance decomposition and estimation, our method quantifies pleiotropy as the portion of between‐trait correlation explained by the same genetic variant. Unlike most multiple‐trait methods that assess potential pleiotropy (i.e., whether a variant contributes to at least one trait), our method formulates a statistic that tests exact pleiotropy (i.e., whether a variant contributes to both of two traits). We developed two approaches (a regression approach and a bootstrapping approach) for such test and investigated their statistical properties, in comparison with other potential pleiotropy test methods. Our simulation shows that the regression approach produces correct P‐values under both the complete null (i.e., a variant has no effect on both two traits) and the incomplete null (i.e., a variant has effect on only one of two traits), but requires large sample sizes to achieve a good power, when the bootstrapping approach has a better power and produces conservative P‐values under the complete null. We demonstrate our method for detecting exact pleiotropy using a real GWAS dataset. Our method provides an easy‐to‐implement tool for measuring, testing, and understanding the pleiotropic effect of a single variant on the correlation architecture of two complex traits.  相似文献   

19.
ObjectivesProcedures for controlling the false positive rate when performing many hypothesis tests are commonplace in health and medical studies. Such procedures, most notably the Bonferroni adjustment, suffer from the problem that error rate control cannot be localized to individual tests, and that these procedures do not distinguish between exploratory and/or data-driven testing vs. hypothesis-driven testing. Instead, procedures derived from limiting false discovery rates may be a more appealing method to control error rates in multiple tests.Study Design and SettingControlling the false positive rate can lead to philosophical inconsistencies that can negatively impact the practice of reporting statistically significant findings. We demonstrate that the false discovery rate approach can overcome these inconsistencies and illustrate its benefit through an application to two recent health studies.ResultsThe false discovery rate approach is more powerful than methods like the Bonferroni procedure that control false positive rates. Controlling the false discovery rate in a study that arguably consisted of scientifically driven hypotheses found nearly as many significant results as without any adjustment, whereas the Bonferroni procedure found no significant results.ConclusionAlthough still unfamiliar to many health researchers, the use of false discovery rate control in the context of multiple testing can provide a solid basis for drawing conclusions about statistical significance.  相似文献   

20.
微阵列数据的多重比较   总被引:3,自引:2,他引:1  
目的 介绍阳性结果错误率(FDR)及相关控制方法在微阵列数据多重比较中的应用。方法 用BH、BL、BY和ALSU四种FDR控制程序比较了3226个基因在两组乳腺癌患者中的表达差异。结果 四个程序在各自实用的范围内均将FDR控制在0.05以下,检验效能由大到小的顺序为:ALSU〉BH〉BY〉BL。ALSU程序因引入m0的估计,更为合理。不仅提高了检验效能,同时又较好地控制了假阳性错误。结论 在微阵列数据的比较中必须考虑FDR的控制,同时又要考虑提高检验效能。多重比较中,控制FDR比控制总Ⅰ型错误率(FWER)检验效能高,且更为实用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号