首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
In this paper, we propose a large-scale multiple testing procedure to find the significant sub-areas between two samples of curves automatically. The procedure is optimal in that it controls the directional false discovery rate at any specified level on a continuum asymptotically. By introducing a nonparametric Gaussian process regression model for the two-sided multiple test, the procedure is computationally inexpensive. It can cope with problems with multidimensional covariates and accommodate different sampling designs across the samples. We further propose the significant curve/surface, giving an insight on dynamic significant differences between two curves. Simulation studies demonstrate that the proposed procedure enjoys superior performance with strong power and good directional error control. The procedure is also illustrated with the application to two executive function studies in hemiplegia.  相似文献   

2.
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR‐controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.  相似文献   

3.
ObjectivesProcedures for controlling the false positive rate when performing many hypothesis tests are commonplace in health and medical studies. Such procedures, most notably the Bonferroni adjustment, suffer from the problem that error rate control cannot be localized to individual tests, and that these procedures do not distinguish between exploratory and/or data-driven testing vs. hypothesis-driven testing. Instead, procedures derived from limiting false discovery rates may be a more appealing method to control error rates in multiple tests.Study Design and SettingControlling the false positive rate can lead to philosophical inconsistencies that can negatively impact the practice of reporting statistically significant findings. We demonstrate that the false discovery rate approach can overcome these inconsistencies and illustrate its benefit through an application to two recent health studies.ResultsThe false discovery rate approach is more powerful than methods like the Bonferroni procedure that control false positive rates. Controlling the false discovery rate in a study that arguably consisted of scientifically driven hypotheses found nearly as many significant results as without any adjustment, whereas the Bonferroni procedure found no significant results.ConclusionAlthough still unfamiliar to many health researchers, the use of false discovery rate control in the context of multiple testing can provide a solid basis for drawing conclusions about statistical significance.  相似文献   

4.
Analyzing safety data from clinical trials to detect safety signals worth further examination involves testing multiple hypotheses, one for each observed adverse event (AE) type. There exists certain hierarchical structure for these hypotheses due to the classification of the AEs into system organ classes, and these AEs are also likely correlated. Many approaches have been proposed to identify safety signals under the multiple testing framework and tried to achieve control of false discovery rate (FDR). The FDR control concerns the expectation of the false discovery proportion (FDP). In practice, the control of the actual random variable FDP could be more relevant and has recently drawn much attention. In this paper, we proposed a two-stage procedure for safety signal detection with direct control of FDP, through a permutation-based approach for screening groups of AEs and a permutation-based approach of constructing simultaneous upper bounds for false discovery proportion. Our simulation studies showed that this new approach has controlled FDP. We demonstrate our approach using data sets derived from a drug clinical trial.  相似文献   

5.
We address the problem of testing whether a possibly high-dimensional vector may act as a mediator between some exposure variable and the outcome of interest. We propose a global test for mediation, which combines a global test with the intersection-union principle. We discuss theoretical properties of our approach and conduct simulation studies that demonstrate that it performs equally well or better than its competitor. We also propose a multiple testing procedure, ScreenMin, that provides asymptotic control of either familywise error rate or false discovery rate when multiple groups of potential mediators are tested simultaneously. We apply our approach to data from a large Norwegian cohort study, where we look at the hypothesis that smoking increases the risk of lung cancer by modifying the level of DNA methylation.  相似文献   

6.
This paper presents an overview of the current state of the art in multiple testing in genomics data from a user's perspective. We describe methods for familywise error control, false discovery rate control and false discovery proportion estimation and confidence, both conceptually and practically, and explain when to use which type of error rate. We elaborate on the assumptions underlying the methods and discuss pitfalls in the interpretation of results. In our discussion, we take into account the exploratory nature of genomics experiments, looking at selection of genes before or after testing, and at the role of validation experiments. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

7.
When simultaneously testing multiple hypotheses, the usual approach in the context of confirmatory clinical trials is to control the familywise error rate (FWER), which bounds the probability of making at least one false rejection. In many trial settings, these hypotheses will additionally have a hierarchical structure that reflects the relative importance and links between different clinical objectives. The graphical approach of Bretz et al (2009) is a flexible and easily communicable way of controlling the FWER while respecting complex trial objectives and multiple structured hypotheses. However, the FWER can be a very stringent criterion that leads to procedures with low power, and may not be appropriate in exploratory trial settings. This motivates controlling generalized error rates, particularly when the number of hypotheses tested is no longer small. We consider the generalized familywise error rate (k-FWER), which is the probability of making k or more false rejections, as well as the tail probability of the false discovery proportion (FDP), which is the probability that the proportion of false rejections is greater than some threshold. We also consider asymptotic control of the false discovery rate, which is the expectation of the FDP. In this article, we show how to control these generalized error rates when using the graphical approach and its extensions. We demonstrate the utility of the resulting graphical procedures on three clinical trial case studies.  相似文献   

8.
Many gene expression data are based on two experiments where the gene expressions of the targeted genes under both experiments are correlated. We consider problems in which objectives are to find genes that are simultaneously upregulated/downregulated under both experiments. A Bayesian methodology is proposed based on directional multiple hypotheses testing. We propose a false discovery rate specific to the problem under consideration, and construct a Bayes rule satisfying a false discovery rate criterion. The proposed method is compared with a traditional rule through simulation studies. We apply our methodology to two real examples involving microRNAs; where in one example the targeted genes are simultaneously downregulated under both experiments, and in the other the targeted genes are downregulated in one experiment and upregulated in the other experiment. We also discuss how the proposed methodology can be extended to more than two experiments. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

9.
Optimal designs for two-stage genome-wide association studies   总被引:3,自引:0,他引:3  
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in stage 2. We describe a strategy for designing cost-effective two-stage GWA studies. Our strategy preserves much of the power of the corresponding one-stage design and minimizes the genotyping cost of the study while allowing for differences in per genotyping cost between stages 1 and 2. We show that the ratio of stage 2 to stage 1 per genotype cost can strongly influence both the optimal design and the genotyping cost of the study. Increasing the stage 2 per genotype cost shifts more of the genotyping and study cost to stage 1, and increases the cost of the study. This higher cost can be partially mitigated by adopting a design with reduced power while preserving the false positive rate or by increasing the false positive rate while preserving power. For example, reducing the power preserved in the two-stage design from 99 to 95% that of the one-stage design decreases the two-stage study cost by approximately 15%. Alternatively, the same cost savings can be had by relaxing the false positive rate by 2.5-fold, for example from 1/300,000 to 2.5/300,000, while retaining the same power.  相似文献   

10.
The original definitions of false discovery rate (FDR) and false non-discovery rate (FNR) can be understood as the frequentist risks of false rejections and false non-rejections, respectively, conditional on the unknown parameter, while the Bayesian posterior FDR and posterior FNR are conditioned on the data. From a Bayesian point of view, it seems natural to take into account the uncertainties in both the parameter and the data. In this spirit, we propose averaging out the frequentist risks of false rejections and false non-rejections with respect to some prior distribution of the parameters to obtain the average FDR (AFDR) and average FNR (AFNR), respectively. A linear combination of the AFDR and AFNR, called the average Bayes error rate (ABER), is considered as an overall risk. Some useful formulas for the AFDR, AFNR and ABER are developed for normal samples with hierarchical mixture priors. The idea of finding threshold values by minimizing the ABER or controlling the AFDR is illustrated using a gene expression data set. Simulation studies show that the proposed approaches are more powerful and robust than the widely used FDR method.  相似文献   

11.
Clinical studies of predictive diagnostic tests consider the evaluation of a single test and comparison of two tests regarding their predictive accuracy of disease status. The positive predictive value (PPV) curve is used for assessing the probability of predicting the disease given a positive test result. The sequential property of one PPV curve had been studied. However, in later stages of diagnostic test development, it is more interesting to compare predictive accuracy of two tests. In this article, we propose a group sequential test for the comparison of PPV curves for paired designs when both diagnostic tests are applied to the same subject. We first derive asymptotic properties of the sequential differences of two correlated empirical PPV curves under the common case-control sampling. We then apply these results to develop a group sequential test procedure. The asymptotic results are also critical for deriving both the optimal sample size ratio and minimal required sample sizes for the proposed procedure. Our simulation studies show that the proposed sequential testing maintains the nominal type I error rate in finite samples. The proposed design is illustrated in a hypothetical lung cancer predictive trial and in a cancer diagnostic trial.  相似文献   

12.

Objective

Researchers in Health Sciences and Medicine often use cohort designs to study treatment effects and changes of outcome variables over time period. The costs of these studies can be reduced by choosing an optimal number of repeated measurements over time and by selecting cohorts of subjects more efficiently with optimal design procedures. The objective of this study is to provide evidence on how to design large-scale cohort studies with budget constraints as efficiently as possible.

Study Design and Setting

A linear cost function for repeated measurements is proposed, and this cost function is used in the optimization procedure. For a given budget/cost, different designs for linear mixed-effects models are compared by means of their efficiency.

Results

We found that adding more repeated measures is only beneficiary if the costs of selecting and measuring a new subject are much higher than the costs of obtaining an additional measurement for an already recruited subject. However, this gain in efficiency and power is not very large.

Conclusion

Adding more cohorts or repeated measurements do not necessarily lead to a gain in efficiency of the estimated model parameters. A general guideline for the optimal choice of a cohort design in practice is required and we offer this guideline.  相似文献   

13.
Comparative analyses of safety/tolerability data from a typical phase III randomized clinical trial generate multiple p-values associated with adverse experiences (AEs) across several body systems. A common approach is to 'flag' any AE with a p-value less than or equal to 0.05, ignoring the multiplicity problem. Despite the fact that this approach can result in excessive false discoveries (false positives), many researchers avoid a multiplicity adjustment to curtail the risk of missing true safety signals. We propose a new flagging mechanism that significantly lowers the false discovery rate (FDR) without materially compromising the power for detecting true signals, relative to the common no-adjustment approach. Our simple two-step procedure is an enhancement of the Mehrotra-Heyse-Tukey approach that leverages the natural grouping of AEs by body systems. We use simulations to show that, on the basis of FDR and power, our procedure is an attractive alternative to the following: (i) the no-adjustment approach; (ii) a one-step FDR approach that ignores the grouping of AEs by body systems; and (iii) a recently proposed two-step FDR approach for much larger-scale settings such as genome-wide association studies. We use three clinical trial examples for illustration.  相似文献   

14.
微阵列数据的多重比较   总被引:3,自引:2,他引:1  
目的 介绍阳性结果错误率(FDR)及相关控制方法在微阵列数据多重比较中的应用。方法 用BH、BL、BY和ALSU四种FDR控制程序比较了3226个基因在两组乳腺癌患者中的表达差异。结果 四个程序在各自实用的范围内均将FDR控制在0.05以下,检验效能由大到小的顺序为:ALSU〉BH〉BY〉BL。ALSU程序因引入m0的估计,更为合理。不仅提高了检验效能,同时又较好地控制了假阳性错误。结论 在微阵列数据的比较中必须考虑FDR的控制,同时又要考虑提高检验效能。多重比较中,控制FDR比控制总Ⅰ型错误率(FWER)检验效能高,且更为实用。  相似文献   

15.
Shao Y  Tseng CH 《Statistics in medicine》2007,26(23):4219-4237
DNA microarrays have been widely used for the purpose of simultaneously monitoring a large number of gene expression levels to identify differentially expressed genes. Statistical methods for the adjustment of multiple testing have been discussed extensively in the literature. An important further challenge is the existence of dependence among test statistics due to reasons such as gene co-regulation. To plan large-scale genomic studies, sample size determination with appropriate adjustment for both multiple testing and potential dependency among test statistics is crucial to avoid an abundance of false-positive results and/or serious lack of power. We introduce a general approach for calculating sample sizes for two-way multiple comparisons in the presence of dependence among test statistics to ensure adequate overall power when the false discovery rates are controlled. The usefulness of the proposed method is demonstrated via numerical studies using both simulated data and real data from a well-known study of leukaemia.  相似文献   

16.
One of main roles of omics-based association studies with high-throughput technologies is to screen out relevant molecular features, such as genetic variants, genes, and proteins, from a large pool of such candidate features based on their associations with the phenotype of interest. Typically, screened features are subject to validation studies using more established or conventional assays, where the number of evaluable features is relatively limited, so that there may exist a fixed number of features measurable by these assays. Such a limitation necessitates narrowing a feature set down to a fixed size, following an initial screening analysis via multiple testing where adjustment for multiplicity is made. We propose a two-stage screening approach to control the false discovery rate (FDR) for a feature set with fixed size that is subject to validation studies, rather than for a feature set from the initial screening analysis. Out of the feature set selected in the first stage with a relaxed FDR level, a fraction of features with most statistical significance is firstly selected. For the remaining feature set, features are selected based on biological consideration only, without regard to any statistical information, which allows evaluating the FDR level for the finally selected feature set with fixed size. Improvement of the power is discussed in the proposed two-stage screening approach. Simulation experiments based on parametric models and real microarray datasets demonstrated substantial increment in the number of screened features for biological consideration compared with the standard screening approach, allowing for more extensive and in-depth biological investigations in omics association studies.  相似文献   

17.
When a large number of hypotheses are investigated, we propose multi-stage designs where in each interim analysis promising hypotheses are screened, which are investigated in further stages. Given a fixed overall number of observations, this allows one to spend more observations for promising hypotheses than with single-stage designs, where the observations are equally distributed among all considered hypotheses. We propose multi-stage procedures controlling either the family-wise error rate (FWER) or the false discovery rate (FDR) and derive asymptotically optimal stopping boundaries and sample size allocations (across stages) to maximize the power of the procedure. Optimized two-stage designs lead to a considerable increase in power compared with the classical single-stage design. Going from two to three stages additionally leads to a distinctive increase in power. Adding a fourth stage leads to a further improvement, which is, however, less pronounced. Surprisingly, we found only small differences in power between optimized integrated designs, where the data of all stages are used in the final test statistics, and optimized pilot designs where only the data from the final stage are used for testing. However, the integrated design controlling the FDR appeared to be more robust against misspecifications in the planning phase. Additionally, we found that with increasing number of stages the drop in power when controlling the FWER instead of the FDR becomes negligible. Our investigations show that the crucial point is not the choice of the error rate or the type of design, but the sequential nature of the trial where non-promising hypotheses are dropped in the early phases of the experiment.  相似文献   

18.
Pharmacovigilance spontaneous reporting systems are primarily devoted to early detection of the adverse reactions of marketed drugs. They maintain large spontaneous reporting databases (SRD) for which several automatic signalling methods have been developed. A common limitation of these methods lies in the fact that they do not provide an auto‐evaluation of the generated signals so that thresholds of alerts are arbitrarily chosen. In this paper, we propose to revisit the Gamma Poisson Shrinkage (GPS) model and the Bayesian Confidence Propagation Neural Network (BCPNN) model in the Bayesian general decision framework. This results in a new signal ranking procedure based on the posterior probability of null hypothesis of interest and makes it possible to derive with a non‐mixture modelling approach Bayesian estimators of the false discovery rate (FDR), false negative rate, sensitivity and specificity. An original data generation process that can be suited to the features of the SRD under scrutiny is proposed and applied to the French SRD to perform a large simulation study. Results indicate better performances according to the FDR for the proposed ranking procedure in comparison with the current ones for the GPS model. They also reveal identical performances according to the four operating characteristics for the proposed ranking procedure with the BCPNN and GPS models but better estimates when using the GPS model. Finally, the proposed procedure is applied to the French data. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.
Bauer P 《Statistics in medicine》2008,27(10):1565-1580
The statistical principles of fully adaptive designs are outlined. The options of flexibility and the price to be paid in terms of statistical properties of the test procedures are discussed. It is stressed that controlled inference after major design modifications (changing hypotheses) will include a penalty: Intersections among all the hypotheses considered throughout the trial have to be rejected before testing individual hypotheses. Moreover, feasibility in terms of integrity and persuasiveness of the results achieved after adaptations based on unblinded data is considered as the crucial issue in practice. In the second part, sample size adaptive procedures are considered testing a large number of hypotheses under constraints on total sample size as in genetic studies. The advantage of sequential procedures is sketched for the example of two-stage designs with a pilot phase for screening promising hypotheses (markers) and controlling the false discovery rate. Finally, we turn to the clinical problem how to select markers and estimate a score from limited samples, e.g. for predicting the response to therapy of a future patient. The predictive ability of such scores will be rather poor when investigating a large number of hypotheses and truly large marker effects are lacking. An obvious dilemma will show up: More optimistic selection rules may be superior if in fact effective markers exist, but will produce more nuisance prediction if no effective markers exist compared with more cautious strategies, e.g. aiming at some control of type I error probabilities.  相似文献   

20.
Multiple testing has been widely adopted for genome-wide studies such as microarray experiments. To improve the power of multiple testing, Storey (J. Royal Statist. Soc. B 2007; 69: 347-368) recently developed the optimal discovery procedure (ODP) which maximizes the number of expected true positives for each fixed number of expected false positives. However, in applying the ODP, we must estimate the true status of each significance test (null or alternative) and the true probability distribution corresponding to each test. In this article, we derive the ODP under hierarchical, random effects models and develop an empirical Bayes estimation method for the derived ODP. Our methods can effectively circumvent the estimation problems in applying the ODP presented by Storey. Simulations and applications to clinical studies of leukemia and breast cancer demonstrated that our empirical Bayes method achieved theoretical optimality and performed well in comparison with existing multiple testing procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号