首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 948 毫秒
1.
When simultaneously testing multiple hypotheses, the usual approach in the context of confirmatory clinical trials is to control the familywise error rate (FWER), which bounds the probability of making at least one false rejection. In many trial settings, these hypotheses will additionally have a hierarchical structure that reflects the relative importance and links between different clinical objectives. The graphical approach of Bretz et al (2009) is a flexible and easily communicable way of controlling the FWER while respecting complex trial objectives and multiple structured hypotheses. However, the FWER can be a very stringent criterion that leads to procedures with low power, and may not be appropriate in exploratory trial settings. This motivates controlling generalized error rates, particularly when the number of hypotheses tested is no longer small. We consider the generalized familywise error rate (k-FWER), which is the probability of making k or more false rejections, as well as the tail probability of the false discovery proportion (FDP), which is the probability that the proportion of false rejections is greater than some threshold. We also consider asymptotic control of the false discovery rate, which is the expectation of the FDP. In this article, we show how to control these generalized error rates when using the graphical approach and its extensions. We demonstrate the utility of the resulting graphical procedures on three clinical trial case studies.  相似文献   

2.
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR‐controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.  相似文献   

3.
This paper presents an overview of the current state of the art in multiple testing in genomics data from a user's perspective. We describe methods for familywise error control, false discovery rate control and false discovery proportion estimation and confidence, both conceptually and practically, and explain when to use which type of error rate. We elaborate on the assumptions underlying the methods and discuss pitfalls in the interpretation of results. In our discussion, we take into account the exploratory nature of genomics experiments, looking at selection of genes before or after testing, and at the role of validation experiments. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
ObjectivesProcedures for controlling the false positive rate when performing many hypothesis tests are commonplace in health and medical studies. Such procedures, most notably the Bonferroni adjustment, suffer from the problem that error rate control cannot be localized to individual tests, and that these procedures do not distinguish between exploratory and/or data-driven testing vs. hypothesis-driven testing. Instead, procedures derived from limiting false discovery rates may be a more appealing method to control error rates in multiple tests.Study Design and SettingControlling the false positive rate can lead to philosophical inconsistencies that can negatively impact the practice of reporting statistically significant findings. We demonstrate that the false discovery rate approach can overcome these inconsistencies and illustrate its benefit through an application to two recent health studies.ResultsThe false discovery rate approach is more powerful than methods like the Bonferroni procedure that control false positive rates. Controlling the false discovery rate in a study that arguably consisted of scientifically driven hypotheses found nearly as many significant results as without any adjustment, whereas the Bonferroni procedure found no significant results.ConclusionAlthough still unfamiliar to many health researchers, the use of false discovery rate control in the context of multiple testing can provide a solid basis for drawing conclusions about statistical significance.  相似文献   

5.
Current analysis of event‐related potentials (ERP) data is usually based on the a priori selection of channels and time windows of interest for studying the differences between experimental conditions in the spatio‐temporal domain. In this work we put forward a new strategy designed for situations when there is not a priori information about ‘when’ and ‘where’ these differences appear in the spatio‐temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives. This issue is known as the problem of multiple comparisons and has been managed with methods that control the false discovery rate (FDR), such as permutation test and FDR methods. Although the former has been previously applied, to our knowledge, the FDR methods have not been introduced in the ERP data analysis. Here we compare the performance (on simulated and real data) of permutation test and two FDR methods (Benjamini and Hochberg (BH) and local‐fdr, by Efron). All these methods have been shown to be valid for dealing with the problem of multiple comparisons in the ERP analysis, avoiding the ad hoc selection of channels and/or time windows. FDR methods are a good alternative to the common and computationally more expensive permutation test. The BH method for independent tests gave the best overall performance regarding the balance between type I and type II errors. The local‐fdr method is preferable for high dimensional (multichannel) problems where most of the tests conform to the empirical null hypothesis. Differences among the methods according to assumptions, null distributions and dimensionality of the problem are also discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
Next‐generation DNA sequencing technologies are facilitating large‐scale association studies of rare genetic variants. The depth of the sequence read coverage is an important experimental variable in the next‐generation technologies and it is a major determinant of the quality of genotype calls generated from sequence data. When case and control samples are sequenced separately or in different proportions across batches, they are unlikely to be matched on sequencing read depth and a differential misclassification of genotypes can result, causing confounding and an increased false‐positive rate. Data from Pilot Study 3 of the 1000 Genomes project was used to demonstrate that a difference between the mean sequencing read depth of case and control samples can result in false‐positive association for rare and uncommon variants, even when the mean coverage depth exceeds 30× in both groups. The degree of the confounding and inflation in the false‐positive rate depended on the extent to which the mean depth was different in the case and control groups. A logistic regression model was used to test for association between case‐control status and the cumulative number of alleles in a collapsed set of rare and uncommon variants. Including each individual's mean sequence read depth across the variant sites in the logistic regression model nearly eliminated the confounding effect and the inflated false‐positive rate. Furthermore, accounting for the potential error by modeling the probability of the heterozygote genotype calls in the regression analysis had a relatively minor but beneficial effect on the statistical results. Genet. Epidemiol. 35: 261‐268, 2011. © 2011 Wiley‐Liss, Inc.  相似文献   

7.
Phase II clinical trials make a critical decision of go or no-go to a subsequent phase III studies. A considerable proportion of promising drugs identified in phase II trials fail the confirmative efficacy test in phase III. Recognizing the low posterior probabilities of H1 when accepting the drug under Simon's two-stage design, the Bayesian enhancement two-stage (BET) design is proposed to strengthen the passing criterion. Under the BET design, the lengths of highest posterior density (HPD) intervals, posterior probabilities of H0 and H1 are computed to calibrate the design parameters, aiming to improve the stability of the trial characteristics and strengthen the evidence for proceeding the drug development forward. However, from a practical perspective, the HPD interval length lacks transparency and interpretability. To circumvent this problem, we propose the BET design with error control (BETEC) by replacing the HPD interval length with the posterior error rate. The BETEC design can achieve a balance between the posterior false positive rate and false negative rate and, more importantly, it has an intuitive and clear interpretation. We compare our method with the BET design and Simon's design through extensive simulation studies. As an illustration, we further apply BETEC to two recent clinical trials, and investigate its performance in comparison with other competitive designs. Being both efficient and intuitive, the BETEC design can serve as an alternative toolbox for implementing phase II single-arm trials.  相似文献   

8.
The original definitions of false discovery rate (FDR) and false non-discovery rate (FNR) can be understood as the frequentist risks of false rejections and false non-rejections, respectively, conditional on the unknown parameter, while the Bayesian posterior FDR and posterior FNR are conditioned on the data. From a Bayesian point of view, it seems natural to take into account the uncertainties in both the parameter and the data. In this spirit, we propose averaging out the frequentist risks of false rejections and false non-rejections with respect to some prior distribution of the parameters to obtain the average FDR (AFDR) and average FNR (AFNR), respectively. A linear combination of the AFDR and AFNR, called the average Bayes error rate (ABER), is considered as an overall risk. Some useful formulas for the AFDR, AFNR and ABER are developed for normal samples with hierarchical mixture priors. The idea of finding threshold values by minimizing the ABER or controlling the AFDR is illustrated using a gene expression data set. Simulation studies show that the proposed approaches are more powerful and robust than the widely used FDR method.  相似文献   

9.
The cumulative sum (CUSUM) control chart is a method for detecting whether the mean of a time series process has shifted beyond some tolerance (ie, is out of control). Originally developed in an industrial process control setting, the CUSUM statistic is typically reset to zero once a process is discovered to be out of control since the industrial process is then recalibrated to be in control. The CUSUM method is also used to detect disease outbreaks in prospective disease surveillance, with a disease outbreak coinciding with an out-of-control process. In a disease surveillance setting, resetting the CUSUM statistic is unrealistic, and a nonrestarting CUSUM chart is used instead. In practice, the nonrestarting CUSUM provides more information but suffers from a high false alarm rate following the end of an outbreak. In this paper, we propose a modified hypothesis test for use with the nonrestarting CUSUM when testing whether a process is out of control. By simulating statistics conditional on the presence of an out-of-control process in recent time periods, we are able to retain the CUSUM's power to detect an out-of-control process while controlling the post–out-of-control false alarm rate at the desired level. We demonstrate this method using data on a Salmonella Newport outbreak that occurred in Germany in 2011. We find that in 7 out of 8 states where the outbreak was detected, the outbreak was detected at the same speed as an unmodified nonrestarting CUSUM while controlling the postoutbreak rate of false alarms at the desired level.  相似文献   

10.
In this paper, we propose a large-scale multiple testing procedure to find the significant sub-areas between two samples of curves automatically. The procedure is optimal in that it controls the directional false discovery rate at any specified level on a continuum asymptotically. By introducing a nonparametric Gaussian process regression model for the two-sided multiple test, the procedure is computationally inexpensive. It can cope with problems with multidimensional covariates and accommodate different sampling designs across the samples. We further propose the significant curve/surface, giving an insight on dynamic significant differences between two curves. Simulation studies demonstrate that the proposed procedure enjoys superior performance with strong power and good directional error control. The procedure is also illustrated with the application to two executive function studies in hemiplegia.  相似文献   

11.
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross‐validated log‐likelihood (max‐cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness‐of‐fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one‐standard‐error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
In recent years, quality control charts have been increasingly applied in the healthcare environment, for example, to monitor surgical performance. Risk-adjusted cumulative (CUSUM) charts that utilize risk scores like the Parsonnet score to estimate the probability of death of a patient from an operation turn out to be susceptible to misfitted risk models causing deterioration of the charts' properties, in particular, the false alarm behavior. Our approach considers the application of power transformations in the logistic regression model to improve the fit to the binary outcome data. We propose two different approaches of estimating the power exponent δ. The average run length (ARL) to false alarm is calculated with the popular Markov chain approximation in a more efficient way by utilizing the Toeplitz structure of the transition matrix. A sensitivity analysis of the in-control ARL against the true value δ shows potential effects of incorrect choice of δ. Depending on the underlying patient mix, the results vary from robustness to severe impact (doubling of false alarm rate).  相似文献   

13.
A central issue in genome‐wide association (GWA) studies is assessing statistical significance while adjusting for multiple hypothesis testing. An equally important question is the statistical efficiency of the GWA design as compared to the traditional sequential approach in which genome‐wide linkage analysis is followed by region‐wise association mapping. Nevertheless, GWA is becoming more popular due in part to cost efficiency: commercially available 1M chips are nearly as inexpensive as a custom‐designed 10 K chip. It is becoming apparent, however, that most of the on‐going GWA studies with 2,000–5,000 samples are in fact underpowered. As a means to improve power, we emphasize the importance of utilizing prior information such as results of previous linkage studies via a stratified false discovery rate (FDR) control. The essence of the stratified FDR control is to prioritize the genome and maintain power to interrogate candidate regions within the GWA study. These candidate regions can be defined as, but are by no means limited to, linkage‐peak regions. Furthermore, we theoretically unify the stratified FDR approach and the weighted P‐value method, and we show that stratified FDR can be formulated as a robust version of weighted FDR. Finally, we demonstrate the utility of the methods in two GWA datasets: Type 2 diabetes (FUSION) and an on‐going study of long‐term diabetic complications (DCCT/EDIC). The methods are implemented as a user‐friendly software package, SFDR. The same stratification framework can be readily applied to other type of studies, for example, using GWA results to improve the power of sequencing data analyses. Genet. Epidemiol. 34: 107–118, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family‐wise error rate, which is defined as the probability that at least one type‐I error is made. More recently, the q‐generalized family‐wise error rate has been introduced to control the probability of making at least q false rejections. For procedures controlling this global type‐I error rate, we define a type‐II r‐generalized family‐wise error rate, which is directly related to the r‐power defined as the probability of rejecting at least r false null hypotheses. We obtain very general power formulas that can be used to compute the sample size for single‐step and step‐wise procedures. These are implemented in our R package rPowerSampleSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute sample sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
In this part II of the paper on adaptive extensions of a two‐stage group sequential procedure (GSP) for testing primary and secondary endpoints, we focus on the second stage sample size re‐estimation based on the first stage data. First, we show that if we use the Cui–Huang–Wang statistics at the second stage, then we can use the same primary and secondary boundaries as for the original procedure (without sample size re‐estimation) and still control the type I familywise error rate. This extends their result for the single endpoint case. We further show that the secondary boundary can be sharpened in this case by taking the unknown correlation coefficient ρ between the primary and secondary endpoints into account through the use of the confidence limit method proposed in part I of this paper. If we use the sufficient statistics instead of the CHW statistics, then we need to modify both the primary and secondary boundaries; otherwise, the error rate can get inflated. We show how to modify the boundaries of the original group sequential procedure to control the familywise error rate. We provide power comparisons between competing procedures. We illustrate the procedures with a clinical trial example. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

16.
We examine the operating characteristics of 17 methods for correcting p-values for multiple testing on synthetic data with known statistical properties. These methods are derived p-values only and not the raw data. With the test cases, we systematically varied the number of p-values, the proportion of false null hypotheses, the probability that a false null hypothesis would result in a p-value less than 5 per cent and the degree of correlation between p-values. We examined the effect of each of these factors on familywise and false negative error rates and compared the false negative error rates of methods with an acceptable familywise error. Only four methods were not bettered in this comparison. Unfortunately, however, a uniformly best method of those examined does not exist. A suggested strategy for examining corrections uses a succession of methods that are increasingly lax in familywise error. A computer program for these corrections is available. © 1997 by John Wiley & Sons, Ltd.  相似文献   

17.
The risk‐adjusted Bernoulli cumulative sum (CUSUM) chart developed by Steiner et al. (2000) is an increasingly popular tool for monitoring clinical and surgical performance. In practice, however, the use of a fixed control limit for the chart leads to a quite variable in‐control average run length performance for patient populations with different risk score distributions. To overcome this problem, we determine simulation‐based dynamic probability control limits (DPCLs) patient‐by‐patient for the risk‐adjusted Bernoulli CUSUM charts. By maintaining the probability of a false alarm at a constant level conditional on no false alarm for previous observations, our risk‐adjusted CUSUM charts with DPCLs have consistent in‐control performance at the desired level with approximately geometrically distributed run lengths. Our simulation results demonstrate that our method does not rely on any information or assumptions about the patients' risk distributions. The use of DPCLs for risk‐adjusted Bernoulli CUSUM charts allows each chart to be designed for the corresponding particular sequence of patients for a surgeon or hospital. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
This study examines the issue of false positives in genomic scans for detecting complex trait loci using sibpair linkage methods and investigates the trade-off between the rate of false positives and the rate of false negatives. It highlights the tremendous cost in terms of power brought about by an excessive control of type I error and, at the same time, confirms that a larger number of false positives can occur otherwise in the course of a genomic scan. Finally, it compares the power and rate of false positives obtained in preplanned replicated studies conducted using a liberal significance level to those for single-step studies that use the same total sample size but stricter levels of significance. For the models considered here, replicate studies were found more attractive as long as one is willing to accept a trade-off, exchanging a much lower rate of false negatives for a slight increase in the rate of false positives. Genet. Epidemiol. 14:453–464,1997. © 1997 Wiley-Liss, Inc.  相似文献   

19.
Controversy over non‐reproducible published research reporting a statistically significant result has produced substantial discussion in the literature. p‐value calibration is a recently proposed procedure for adjusting p‐values to account for both random and systematic errors that address one aspect of this problem. The method's validity rests on the key assumption that bias in an effect estimate is drawn from a normal distribution whose mean and variance can be correctly estimated. We investigated the method's control of type I and type II error rates using simulated and real‐world data. Under mild violations of underlying assumptions, control of the type I error rate can be conservative, while under more extreme departures, it can be anti‐conservative. The extent to which the assumption is violated in real‐world data analyses is unknown. Barriers to testing the plausibility of the assumption using historical data are discussed. Our studies of the type II error rate using simulated and real‐world electronic health care data demonstrated that calibrating p‐values can substantially increase the type II error rate. The use of calibrated p‐values may reduce the number of false‐positive results, but there will be a commensurate drop in the ability to detect a true safety or efficacy signal. While p‐value calibration can sometimes offer advantages in controlling the type I error rate, its adoption for routine use in studies of real‐world health care datasets is premature. Separate characterizations of random and systematic errors provide a richer context for evaluating uncertainty surrounding effect estimates. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
High-throughput screening (HTS) is a large-scale hierarchical process in which a large number of chemicals are tested in multiple stages. Conventional statistical analyses of HTS studies often suffer from high testing error rates and soaring costs in large-scale settings. This article develops new methodologies for false discovery rate control and optimal design in HTS studies. We propose a two-stage procedure that determines the optimal numbers of replicates at different screening stages while simultaneously controlling the false discovery rate in the confirmatory stage subject to a constraint on the total budget. The merits of the proposed methods are illustrated using both simulated and real data. We show that, at the expense of a limited budget, the proposed screening procedure effectively controls the error rate and the design leads to improved detection power.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号