首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
Increasing the sample size based on unblinded interim result may inflate the type I error rate and appropriate statistical adjustments may be needed to control the type I error rate at the nominal level. We briefly review the existing approaches which allow early stopping due to futility, or change the test statistic by using different weights, or adjust the critical value for final test, or enforce rules for sample size recalculation. The implication of early stopping due to futility and a simple modification to the weighted Z-statistic approach are discussed. In this paper, we show that increasing the sample size when the unblinded interim result is promising will not inflate the type I error rate and therefore no statistical adjustment is necessary. The unblinded interim result is considered promising if the conditional power is greater than 50 per cent or equivalently, the sample size increment needed to achieve a desired power does not exceed an upper bound. The actual sample size increment may be determined by important factors such as budget, size of the eligible patient population and competition in the market. The 50 per cent-conditional-power approach is extended to a group sequential trial with one interim analysis where a decision may be made at the interim analysis to stop the trial early due to a convincing treatment benefit, or to increase the sample size if the interim result is not as good as expected. The type I error rate will not be inflated if the sample size may be increased only when the conditional power is greater than 50 per cent. If there are two or more interim analyses in a group sequential trial, our simulation study shows that the type I error rate is also well controlled.  相似文献   

2.
A major assumption of the Cox proportional hazards model is that the effect of a given covariate does not change over time. If this assumption is violated, the simple Cox model is invalid, and more sophisticated analyses are required. This paper describes eight graphical methods for detecting violations of the proportional hazards assumption and demonstrates each on three published datasets with a single binary covariate. I discuss the relative merits of these methods. Smoothed plots of the scaled Schoenfeld residuals are recommended for assessing PH violations because they provide precise usable information about the time dependence of the covariate effects.  相似文献   

3.
In clinical trials with time‐to‐event endpoints, it is not uncommon to see a significant proportion of patients being cured (or long‐term survivors), such as trials for the non‐Hodgkins lymphoma disease. The popularly used sample size formula derived under the proportional hazards (PH) model may not be proper to design a survival trial with a cure fraction, because the PH model assumption may be violated. To account for a cure fraction, the PH cure model is widely used in practice, where a PH model is used for survival times of uncured patients and a logistic distribution is used for the probability of patients being cured. In this paper, we develop a sample size formula on the basis of the PH cure model by investigating the asymptotic distributions of the standard weighted log‐rank statistics under the null and local alternative hypotheses. The derived sample size formula under the PH cure model is more flexible because it can be used to test the differences in the short‐term survival and/or cure fraction. Furthermore, we also investigate as numerical examples the impacts of accrual methods and durations of accrual and follow‐up periods on sample size calculation. The results show that ignoring the cure rate in sample size calculation can lead to either underpowered or overpowered studies. We evaluate the performance of the proposed formula by simulation studies and provide an example to illustrate its application with the use of data from a melanoma trial. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
Chen YH  Chen C 《Statistics in medicine》2012,31(15):1531-1542
Shift in research and development strategy from developing follow-on or 'me-too' drugs to differentiated medical products with potentially better efficacy than the standard of care (e.g., first-in-class, best-in-class, and bio-betters) highlights the scientific and commercial interests in establishing superiority even when a non-inferiority design, adequately powered for a pre-specified non-inferiority margin, is appropriate for various reasons. In this paper, we propose a group sequential design to test superiority at interim analyses in a non-inferiority trial. We will test superiority at the interim analyses using conventional group sequential methods, and we may stop the study because of better efficacy. If the study fails to establish superior efficacy at the interim and final analyses, we will test the primary non-inferiority hypothesis at the final analysis at the nominal level without alpha adjustment. Whereas superiority/non-inferiority testing no longer has the hierarchical structure in which the rejection region for testing superiority is a subset of that for testing non-inferiority, the impact of repeated superiority tests on the false positive rate and statistical power for the primary non-inferiority test at the final analysis is essentially ignorable. For the commonly used O'Brien-Fleming type alpha-spending function, we show that the impact is extremely small based upon Brownian motion boundary-crossing properties. Numerical evaluation further supports the conclusion for other alpha-spending functions with a substantial amount of alpha being spent on the interim superiority tests. We use a clinical trial example to illustrate the proposed design.  相似文献   

5.
Confidence intervals (CIs) and the reported predictive ability of statistical models may be misleading if one ignores uncertainty in the model selection procedure. When analyzing time-to-event data using Cox regression, one typically checks the proportional hazards (PH) assumption and subsequently alters the model to address any violations. Such an examination and correction constitute a model selection procedure, and, if not accounted for, could result in misleading CI. With the bootstrap, I study the impact of checking the PH assumption using (1) data to predict AIDS-free survival among HIV-infected patients initiating antiretroviral therapy and (2) simulated data. In the HIV study, due to non-PH, a Cox model was stratified on age quintiles. Interestingly, bootstrap CIs that ignored the PH check (always stratified on age quintiles) were wider than those which accounted for the PH check (on each bootstrap replication PH was tested and corrected through stratification only if violated). Simulations demonstrated that such a phenomenon is not an anomaly, although on average CIs widen when accounting for the PH check. In most simulation scenarios, coverage probabilities adjusting and not adjusting for the PH check were similar. However, when data were generated under a minor PH violation, the 95 per cent bootstrap CI ignoring the PH check had a coverage of 0.77 as opposed to 0.95 for CI accounting for the PH check. The impact of checking the PH assumption is greatest when the p-value of the test for PH is close to the test's chosen Type I error probability.  相似文献   

6.
Time‐to‐event data are very common in observational studies. Unlike randomized experiments, observational studies suffer from both observed and unobserved confounding biases. To adjust for observed confounding in survival analysis, the commonly used methods are the Cox proportional hazards (PH) model, the weighted logrank test, and the inverse probability of treatment weighted Cox PH model. These methods do not rely on fully parametric models, but their practical performances are highly influenced by the validity of the PH assumption. Also, there are few methods addressing the hidden bias in causal survival analysis. We propose a strategy to test for survival function differences based on the matching design and explore sensitivity of the P‐values to assumptions about unmeasured confounding. Specifically, we apply the paired Prentice‐Wilcoxon (PPW) test or the modified PPW test to the propensity score matched data. Simulation studies show that the PPW‐type test has higher power in situations when the PH assumption fails. For potential hidden bias, we develop a sensitivity analysis based on the matched pairs to assess the robustness of our finding, following Rosenbaum's idea for nonsurvival data. For a real data illustration, we apply our method to an observational cohort of chronic liver disease patients from a Mayo Clinic study. The PPW test based on observed data initially shows evidence of a significant treatment effect. But this finding is not robust, as the sensitivity analysis reveals that the P‐value becomes nonsignificant if there exists an unmeasured confounder with a small impact.  相似文献   

7.
In this paper, we consider the potential bias in the estimated treatment effect obtained from clinical trials, the protocols of which include the possibility of interim analyses and an early termination of the study for reasons of futility. In particular, by considering the conditional power at an interim analysis, we derive analytic expressions for various parameters of interest: (i) the underestimation or overestimation of the treatment effect in studies that stop for futility; (ii) the impact of the interim analyses on the estimation of treatment effect in studies that are completed, i.e. that do not stop for futility; (iii) the overall estimation bias in the estimated treatment effect in a single study with such a stopping rule; and (iv) the probability of stopping at an interim analysis. We evaluate these general expressions numerically for typical trial scenarios. Results show that the parameters of interest depend on a number of factors, including the true underlying treatment effect, the difference that the trial is designed to detect, the study power, the number of planned interim analyses and what assumption is made about future data to be observed after an interim analysis. Because the probability of stopping early is small for many practical situations, the overall bias is often small, but a more serious issue is the potential for substantial underestimation of the treatment effect in studies that actually stop for futility. We also consider these ideas using data from an illustrative trial that did stop for futility at an interim analysis. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

8.
Clinical trials incorporating treatment selection at pre-specified interim analyses allow to integrate two clinical studies into a single, confirmatory study. In an adaptive interim analysis, treatment arms are selected based on interim data as well as external information. The specific selection rule does not need to be pre-specified in advance in order to control the multiple type I error rate. We propose an adaptive Dunnett test procedure based on the conditional error rate of the single-stage Dunnett test. The adaptive procedure uniformly improves the classical Dunnett test, which is shown to be strictly conservative if treatments are dropped at interim. The adaptive Dunnett test is compared in a simulation with the classical Dunnett test as well as with adaptive combination tests based on the closure principle. The method is illustrated with a real-data example.  相似文献   

9.
10.
The proportion of the genome that is shared identical by descent (IBD) between pairs of individuals is often estimated in studies involving genome‐wide SNP data. These estimates can be used to check pedigrees, estimate heritability, and adjust association analyses. We focus on the method of moments technique as implemented in PLINK [Purcell et al., 2007] and other software that estimates the proportions of the genome at which two individuals share 0, 1, or 2 alleles IBD. This technique is based on the assumption that the study sample is drawn from a single, homogeneous, randomly mating population. This assumption is violated if pedigree founders are drawn from multiple populations or include admixed individuals. In the presence of population structure, the method of moments estimator has an inflated variance and can be biased because it relies on sample‐based allele frequency estimates. In the case of the PLINK estimator, which truncates genome‐wide sharing estimates at zero and one to generate biologically interpretable results, the bias is most often towards over‐estimation of relatedness between ancestrally similar individuals. Using simulated pedigrees, we are able to demonstrate and quantify the behavior of the PLINK method of moments estimator under different population structure conditions. We also propose a simple method based on SNP pruning for improving genome‐wide IBD estimates when the assumption of a single, homogeneous population is violated.  相似文献   

11.
When a clinical trial is subject to a series of interim analyses as a result of which the study may be terminated or modified, final frequentist analyses need to take account of the design used. Failure to do so may result in overstated levels of significance, biased effect estimates and confidence intervals with inadequate coverage probabilities. A wide variety of valid methods of frequentist analysis have been devised for sequential designs comparing a single experimental treatment with a single control treatment. It is less clear how to perform the final analysis of a sequential or adaptive design applied in a more complex setting, for example, to determine which treatment or set of treatments amongst several candidates should be recommended. This article has been motivated by consideration of a trial in which four treatments for sepsis are to be compared, with interim analyses allowing the dropping of treatments or termination of the trial to declare a single winner or to conclude that there is little difference between the treatments that remain. The approach taken is based on the method of Rao-Blackwellization which enhances the accuracy of unbiased estimates available from the first interim analysis by taking their conditional expectations given final sufficient statistics. Analytic approaches to determine such expectations are difficult and specific to the details of the design: instead “reverse simulations” are conducted to construct replicate realizations of the first interim analysis from the final test statistics. The method also provides approximate confidence intervals for the differences between treatments.  相似文献   

12.
In phase 3 clinical trials, ethical and financial concerns motivate sequential analyses in which the data are analyzed prior to completion of the entire planned study. Existing group sequential software accounts for the effects of these interim analyses on the sampling density by assuming that the contribution of subsequent increments is independent of the contribution from previous data. This independent increment assumption is satisfied in many common circumstances, including when using the efficient estimator. However, certain circumstances may dictate using an inefficient estimator, and the independent increment assumption may then be violated. Consequences of assuming independent increments in a setting where the assumption does not hold have not been previously explored. One important setting in which independent increments may not hold is the setting of longitudinal clinical trials. This paper considers dependent increments that arise because of heteroscedastic and correlated data in the context of longitudinal clinical trials that use a generalized estimating equation (GEE) approach. Both heteroscedasticity over time and correlation of observations within subjects may lead to departures from the independent increment assumption when using GEE. We characterize situations leading to greater departures in this paper. Despite violations of the independent increment assumption, simulation results suggest that operating characteristics of sequential designs are largely maintained for typically observed patterns of accrual, correlation, and heteroscedasticity even when using analyses that use standard software that depends on an independent increment structure. More extreme scenarios may require greater care to avoid departures from the nominal type I error rate and power. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
We consider methodological problems in evaluating long-term survival in clinical trials. In particular we examine the use of several methods that extend the basic Cox regression analysis. In the presence of a long term observation, the proportional hazard (PH) assumption may easily be violated and a few long term survivors may have a large effect on parameter estimates. We consider both model selection and robust estimation in a data set of 474 ovarian cancer patients enrolled in a clinical trial and followed for between 7 and 12 years after randomization. Two diagnostic plots for assessing goodness-of-fit are introduced. One shows the variation in time of parameter estimates and is an alternative to PH checking based on time-dependent covariates. The other takes advantage of the martingale residual process in time to represent the lack of fit with a metric of the type ‘observed minus expected’ number of events. Robust estimation is carried out by maximizing a weighted partial likelihood which downweights the contribution to estimation of influential observations. This type of complementary analysis of long-term results of clinical studies is useful in assessing the soundness of the conclusions on treatment effect. In the example analysed here, the difference in survival between treatments was mostly confined to those individuals who survived at least two years beyond randomization  相似文献   

14.
While designing a group sequential clinical trial in the pharmaceutical industry setting, we often face a problem of determining the time for an interim analysis. For a two-stage trial we compute the sample sizes n and N per treatment group for the interim and final analyses. respectively, that minimize the average trial size for a specified overall power. We consider this optimization when we monitor the trial using a Lan–DeMets α-spending function. Two additional problems considered in this context are (i) finding the sample sizes that maximize the overall power for a specified average trial size, and (ii) finding the sample sizes that achieve specified powers at the interim and final analyses.  相似文献   

15.
Traditional methods of sample size and power calculations in clinical trials with a time‐to‐event end point are based on the logrank test (and its variations), Cox proportional hazards (PH) assumption, or comparison of means of 2 exponential distributions. Of these, sample size calculation based on PH assumption is likely the most common and allows adjusting for the effect of one or more covariates. However, when designing a trial, there are situations when the assumption of PH may not be appropriate. Additionally, when it is known that there is a rapid decline in the survival curve for a control group, such as from previously conducted observational studies, a design based on the PH assumption may confer only a minor statistical improvement for the treatment group that is neither clinically nor practically meaningful. For such scenarios, a clinical trial design that focuses on improvement in patient longevity is proposed, based on the concept of proportional time using the generalized gamma ratio distribution. Simulations are conducted to evaluate the performance of the proportional time method and to identify the situations in which such a design will be beneficial as compared to the standard design using a PH assumption, piecewise exponential hazards assumption, and specific cases of a cure rate model. A practical example in which hemorrhagic stroke patients are randomized to 1 of 2 arms in a putative clinical trial demonstrates the usefulness of this approach by drastically reducing the number of patients needed for study enrollment.  相似文献   

16.
Interim analyses are routinely used to monitor accumulating data in clinical trials. When the objective of the interim analysis is to stop the trial if the trial is deemed futile, it must ideally be conducted as early as possible. In trials where the clinical endpoint of interest is only observed after a long follow-up, many enrolled patients may therefore have no information on the primary endpoint available at the time of the interim analysis. To facilitate earlier decision-making, one may incorporate early response data that are predictive for the primary endpoint (eg, an assessment of the primary endpoint at an earlier time) in the interim analysis. Most attention so far has been given to the development of interim test statistics that include such short-term endpoints, but not to decision procedures. Existing tests moreover perform poorly when the information is scarce, eg, due to rare events, when the cohort of patients with observed primary endpoint data is small, or when the short-term endpoint is a strong but imperfect predictor. In view of this, we develop an interim decision procedure based on the conditional power approach that utilizes the short-term and long-term binary endpoints in a framework that is expected to provide reliable inferences, even when the primary endpoint is only available for a few patients, and has the added advantage that it allows the use of historical information. The operational characteristics of the proposed procedure are evaluated for the phase III clinical trial that motivated this approach, using simulation studies.  相似文献   

17.
Nam JM 《Statistics in medicine》2006,25(9):1521-1531
In this paper, we assess the performance of homogeneity tests for two or more kappa statistics when prevalence rates across reliability studies are assumed to be equal. The likelihood score method and the chi-square goodness-of-fit (GOF) test provide type 1 error rates that are satisfactorily close to the nominal level, but a Fleiss-like test is not satisfactory for small or moderate sample sizes. Simulations show that the score test is more powerful than the chi-square GOF test and the approximate sample size required for a specific power of the former is substantially smaller than the latter. In addition, the score test is robust to deviations from the equal prevalence assumption, while the GOF test is highly sensitive and it may give a grossly misleading type 1 error rate when the assumption of equal prevalence is violated. We conclude that the homogeneity score test is the preferred method.  相似文献   

18.
Sequential analysis in clinical and epidemiological research   总被引:1,自引:0,他引:1  
In clinical trials and epidemiological studies, interim analyses are usually performed for ethical or economical reasons, although efficiency aspects can also be a reason. The analysis is referred to as group-sequential analysis or continuous-sequential analysis, dependent on the number of interim analyses performed. Repeated testing of cumulative data requires an adjustment to the significance level or type I error alpha of a statistical test. On average a (group-)sequential analysis of a clinical or epidemiological study requires less patients than the analysis of a fixed-sample study.  相似文献   

19.
We consider a study starting with two treatment groups and a control group with a planned interim analysis. The inferior treatment group will be dropped after the interim analysis, and only the winning treatment and the control will continue to the end of the study. This 'Two-Stage Winner Design' is based on the concepts of multiple comparison, adaptive design, and winner selection. In a study with such a design, there is less multiplicity, but more adaptability if the interim selection is performed at an early stage. If the interim selection is performed close to the end of the study, the situation becomes the conventional multiple comparison where Dunnett's method may be applied.The unconditional distribution of the final test statistic from the 'winner' treatment is no longer normal, the exact distribution of which is provided in this paper, but numerical integration is needed for its calculation. To avoid complex computations, we propose a normal approximation approach to calculate the type I error, the power, the point estimate, and the confidence intervals. Due to the well understood and attractive properties of the normal distribution, the 'Winner Design' can be easily planned and adequately executed, which is demonstrated by an example. We also provide detailed discussion on how the proposed design should be practically implemented by optimizing the timing of the interim look and the probability of winner selection.  相似文献   

20.
A simple adjustment to the Pearson chi-square test has been proposed for comparing proportions estimated from clustered binary observations. However, the assumptions needed to assure the validity of this test have not yet been thoroughly addressed. These assumptions will hold for experimental comparisons, but could be violated for some observational comparisons. In this paper we investigate the conditions under which the adjusted chi-square statistic is valid and examine its performance when these assumptions are violated. We also introduce some alternative test statistics that do not require these assumptions. The test statistics considered are then compared through simulation and an example presented based on real data. The simulation study shows that the adjusted chi-square statistic generally produces empirical type I errors close to nominal under the assumption of a common intracluster correlation coefficient. Even if the intracluster correlations are different, the adjusted chi-square statistic performs well when the groups have equal numbers of clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号