首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Propensity score methods are increasingly being used to estimate causal treatment effects in the medical literature. Conditioning on the propensity score results in unbiased estimation of the expected difference in observed responses to two treatments. The degree to which conditioning on the propensity score introduces bias into the estimation of the conditional odds ratio or conditional hazard ratio, which are frequently used as measures of treatment effect in observational studies, has not been extensively studied. We conducted Monte Carlo simulations to determine the degree to which propensity score matching, stratification on the quintiles of the propensity score, and covariate adjustment using the propensity score result in biased estimation of conditional odds ratios, hazard ratios, and rate ratios. We found that conditioning on the propensity score resulted in biased estimation of the true conditional odds ratio and the true conditional hazard ratio. In all scenarios examined, treatment effects were biased towards the null treatment effect. However, conditioning on the propensity score did not result in biased estimation of the true conditional rate ratio. In contrast, conventional regression methods allowed unbiased estimation of the true conditional treatment effect when all variables associated with the outcome were included in the regression model. The observed bias in propensity score methods is due to the fact that regression models allow one to estimate conditional treatment effects, whereas propensity score methods allow one to estimate marginal treatment effects. In several settings with non-linear treatment effects, marginal and conditional treatment effects do not coincide.  相似文献   

2.
The Cochran-Armitage trend test has been used in case-control studies for testing genetic association. As the variance of the test statistic is a function of unknown parameters, e.g. disease prevalence and allele frequency, it must be estimated. The usual estimator combining data for cases and controls assumes they follow the same distribution under the null hypothesis. Under the alternative hypothesis, however, the cases and controls follow different distributions. Thus, the power of the trend tests may be affected by the variance estimator used. In particular, the usual method combining both cases and controls is not an asymptotically unbiased estimator of the null variance when the alternative is true. Two different estimates of the null variance are available which are consistent under both the null and alternative hypotheses. In this paper, we examine sample size and small sample power performance of trend tests, which are optimal for three common genetic models as well as a robust trend test based on the three estimates of the variance and provide guidelines for choosing an appropriate test.  相似文献   

3.
Allocating a proportion k'=1/(1+ radicalr(0)) of subjects to an intervention is a practical approach to approximately maximize power for testing whether an intervention reduces relative risk of disease below a null ratio r(0)<1. Furthermore, allocating k'(s), a convenient fraction close to k', to intervention performs nearly as well; for example, allocating k'(s)=3/5 for 0.5> or =r(0)>0.33,2/3 for 0.33> or =r(0)>0.17 and 3/4 for 0.17> or =r(0)> or =0.10. Both k' and k'(s) are easily calculated and invariant to alterations in disease rate estimates under null and alternative hypotheses, when r(0) remains constant. In examples that we studied, allocating k' (or k'(s)) subjects to intervention achieved close to the minimum possible sample size, given test size and power (equivalently, maximum power, given test size and sample size), for likelihood score tests. Compared to equal allocation, k' and k'(s) reduced sample sizes by amounts ranging from approximately 5.5 per cent for r(0)=0.50 to approximately 24 per cent for r(0)=0.10. These sample size savings may be particularly important for large studies of prophylactic interventions such as vaccines. While k' was derived from variance minimization for an arcsine transformation, we do not recommend the arcsine test, since its true size exceeded the nominal value. In contrast, the true size for the uncorrected score test was less than the nominal size. A skewness correction made the size of the score test very close to the nominal level and slightly increased power. We recommend using the score test, or the skewness-corrected score test, for planing studies designed to show a ratio of proportions is less than a prespecified null ratio r(0)<1.  相似文献   

4.
This paper compares the sample size formulae given by Schoenfeld, Freedman, Hsieh and Shuster for unbalanced designs. Freedman's formula predicts the highest power for the logrank test when the sample size ratio of the two groups equals the reciprocal of the hazard ratio. The other three formulae predict highest powers when sample sizes in the two groups are equal. Results of Monte Carlo simulations performed for the power of the logrank test with various sample size ratios show that the power curve of the logrank test is almost flat between a sample size ratio of one and a sample size ratio close to the reciprocal of the hazard ratio. An equal sample-size allocation may not maximize the power of the logrank test. Monte Carlo simulations also show that, under an exponential model, when the sample size ratio is toward the reciprocal of the hazard ratio, Freedman's formula predicts more accurate powers. Schoenfeld's formula, however, seems best for predicting powers with equal sample size.  相似文献   

5.
Ahnn and Anderson derived sample size formulae for unstratified and stratified designs assuming equal allocation of subjects to three or more treatment groups. We generalize the sample size formulae to allow for unequal allocation. In addition, we define the overall probability of death to be equal to one minus the censored proportion for the stratified design. This definition also leads to a slightly different definition of the non-centrality parameter than that of Ahnn and Anderson for the stratified case. Assuming proportional hazards, sample sizes are determined for a prespecified power, significance level, hazard ratios, allocation of subjects to several treatment groups, and known censored proportion. In the proportional hazards setting, three cases are considered: (1) exponential failures--exponential censoring, (2) exponential failures--uniform censoring, and (3) Weibull failures (assuming same shape parameter for all groups)--uniform censoring. In all three cases of the unstratified case, it is assumed that the censoring distribution is the same for all of the treatment groups. For the stratified log-rank test, it is assumed the same censoring distribution across the treatment groups and the strata. Further, formulae have been developed to provide approximate powers for the test, based upon the first two or first four-moments of the asymptotic distribution. We observe the following two major findings based on the simulations. First, the simulated power of the log-rank test does not depend on the censoring mechanism. Second, for a significance level of 0.05 and power of 0.80, the required sample size n is independent of the censoring pattern. Moreover, there is very close agreement between the exact (asymptotic) and simulated powers when a sequence of alternatives is close to the null hypothesis. Two-moment and four-moment power series approximations also yield powers in close agreement with the exact (asymptotic) power. With unequal allocations, our simulations show that the empirical powers are consistently above the target value of prespecified power of 0.80 when 50 per cent of the patients are allocated to the treatment group with the smallest hazard.  相似文献   

6.
J Benichou  E Bellissant  C Chastang 《Statistics in medicine》1991,10(6):989; discussion 989-989; discussion 990
Phase II cancer clinical trials are non-comparative trials which are designed to determine whether the response rate p to the treatment under study is greater than a certain value p0, that is, to test H0, given by p less than or equal to p0 against H1 given by p greater than po. By choosing type I error alpha and the power 1-beta and by specifying H1, that is, by choosing a clinically relevant improvement p1), one can compute the number of patients N to be included for a fixed-sample approach. Various other approaches have been proposed such as multistage methods and Wald's continuous sequential probability ratio test (SPRT). As an alternative approach, we extended the triangular test (TT), proposed by Whitehead for comparative trials, to the situation of non-comparative trials with a binary outcome. We expressed H0 and H1 in terms of the log odds-ratio statistics, namely log [p(1-p0)/p0-(1-p)]. With this choice, the two statistics of interest, Z and V, have simple expressions: Z is the difference between the observed number of positive outcomes and the expected number under H0 and V is the variance of Z under H0. After every group of n patients, Z is plotted against V, and the trial proceeds until a boundary is crossed. In our simulations, type I error alpha and the power 1-beta were close to nominal values with the TT and the average sample size was close to Wald's continuous SPRT and compared favourably with the multistage methods proposed by Herson and Fleming. Given its statistical properties and its easy use, the TT should be considered for planning and analysing cancer phase II trials.  相似文献   

7.
The importance of post‐marketing surveillance for drug and vaccine safety is well recognized as rare but serious adverse events may not be detected in pre‐approval clinical trials. In such surveillance, a sequential test is preferable, in order to detect potential problems as soon as possible. Various sequential probability ratio tests (SPRT) have been applied in near real‐time vaccine and drug safety surveillance, including Wald's classical SPRT with a single alternative and the Poisson‐based maximized SPRT (MaxSPRT) with a composite alternative. These methods require that the expected number of events under the null hypothesis is known as a function of time t. In practice, the expected counts are usually estimated from historical data. When a large sample size from the historical data is lacking, the SPRTs are biased due to the variance in the estimate of the expected number of events. We present a conditional maximized sequential probability ratio test (CMaxSPRT), which adjusts for the uncertainty in the expected counts. Our test incorporates the randomness and variability from both the historical data and the surveillance population. Evaluations of the statistical power for CMaxSPRT are presented under different scenarios. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
Any genome-wide analysis is hampered by reduced statistical power due to multiple comparisons. This is particularly true for interaction analyses, which have lower statistical power than analyses of associations. To assess gene–environment interactions in population settings we have recently proposed a statistical method based on a modified two-step approach, where first genetic loci are selected by their associations with disease and environment, respectively, and subsequently tested for interactions. We have simulated various data sets resembling real world scenarios and compared single-step and two-step approaches with respect to true positive rate (TPR) in 486 scenarios and (study-wide) false positive rate (FPR) in 252 scenarios. Our simulations confirmed that in all two-step methods the two steps are not correlated. In terms of TPR, two-step approaches combining information on gene-disease association and gene–environment association in the first step were superior to all other methods, while preserving a low FPR in over 250 million simulations under the null hypothesis. Our weighted modification yielded the highest power across various degrees of gene–environment association in the controls. An optimal threshold for step 1 depended on the interacting allele frequency and the disease prevalence. In all scenarios, the least powerful method was to proceed directly to an unbiased full interaction model, applying conventional genome-wide significance thresholds. This simulation study confirms the practical advantage of two-step approaches to interaction testing over more conventional one-step designs, at least in the context of dichotomous disease outcomes and other parameters that might apply in real-world settings.  相似文献   

9.
Uncertainty surrounding the error covariance matrix often presents the biggest barrier to achieving accurate power analysis in the 'univariate' approach to repeated measures analysis of variance (UNIREP). A poor choice gives either an overpowered study which wastes resources, or an underpowered study with little chance of success. Internal pilot designs were introduced to resolve such uncertainty about error variance for t-tests. In earlier papers, we extended the use of internal pilots to any univariate linear model with fixed predictors and independent Gaussian errors. Here we further extend our exact and approximate results to UNIREP analysis. For a fixed treatment effect, the inaccuracy in a power calculation depends only on the ratio of the true variance to the value used for planning. The greater complexity of repeated measures requires generalizing misspecification of error variance to the misspecification of the eigenvalues of the error covariance. We recommend approximating the misspecification in terms of the first and second moments of the eigenvalues, for both fixed sample and internal pilot designs. We also describe an unadjusted approach for internal pilots with repeated measures. Simulations illustrate the fact that both positive and negative properties in the univariate setting extend to repeated measures analysis. In particular, internal pilots allow maintaining power or reducing expected sample size when the covariance matrix used for planning differs from the true value. However, an unadjusted approach can inflate test size, at least with small to moderate sample sizes. Hence new, adjusted methods must be developed for small samples. At this time, we caution against using an internal pilot design with repeated measures without first conducting simulations to document the amount of test size inflation possible for the conditions of interest.  相似文献   

10.
We investigate through computer simulations the robustness and power of two group analysis of covariance tests applied to small samples distorted from normality by floor effects when the regression slopes are homogeneous. We consider four parametric analysis of covariance tests that vary according to the treatment of the homogeneity of regression slopes and two t-tests on unadjusted means and on difference scores. Under the null hypothesis of no difference in means, we estimated actual significance levels by comparing observed test statistics to appropriate values from the F and t distributions for nominal significance levels of 0⋅10, 0⋅05, 0⋅02 and 0⋅01. We estimated power by similar comparisons under various alternative hypotheses. The hierarchical approach (that adjusts for non-homogeneous slopes if found significant), the test that assumes homogeneous regression slopes, and the test that estimates separate regression slopes in each treatment were robust. In general, each test produced power at least equal to that expected from normal theory. The textbook approach, which does not test for mean differences when there is significant non-homogeneity, was conservative but also had good power. The t-tests were robust but had poorer power properties than the above procedures.  相似文献   

11.
Composite endpoints combine several events within a single variable, which increases the number of expected events and is thereby meant to increase the power. However, the interpretation of results can be difficult as the observed effect for the composite does not necessarily reflect the effects for the components, which may be of different magnitude or even point in adverse directions. Moreover, in clinical applications, the event types are often of different clinical relevance, which also complicates the interpretation of the composite effect. The common effect measure for composite endpoints is the all‐cause hazard ratio, which gives equal weight to all events irrespective of their type and clinical relevance. Thereby, the all‐cause hazard within each group is given by the sum of the cause‐specific hazards corresponding to the individual components. A natural extension of the standard all‐cause hazard ratio can be defined by a “weighted all‐cause hazard ratio” where the individual hazards for each component are multiplied with predefined relevance weighting factors. For the special case of equal weights across the components, the weighted all‐cause hazard ratio then corresponds to the standard all‐cause hazard ratio. To identify the cause‐specific hazard of the individual components, any parametric survival model might be applied. The new weighted effect measure can be tested for deviations from the null hypothesis by means of a permutation test. In this work, we systematically compare the new weighted approach to the standard all‐cause hazard ratio by theoretical considerations, Monte‐Carlo simulations, and by means of a real clinical trial example.  相似文献   

12.
In the analysis of composite endpoints in a clinical trial, time to first event analysis techniques such as the logrank test and Cox proportional hazard test do not take into account the multiplicity, importance, and the severity of events in the composite endpoint. Several generalized pairwise comparison analysis methods have been described recently that do allow to take these aspects into account. These methods have the additional benefit that all types of outcomes can be included, such as longitudinal quantitative outcomes, to evaluate the full treatment effect. Four of the generalized pairwise comparison methods, ie, the Finkelstein-Schoenfeld, the Buyse, unmatched Pocock, and adapted O'Brien test, are summarized. They are compared to each other and to the logrank test by means of simulations while specifically evaluating the effect of correlation between components of the composite endpoint on the power to detect a treatment difference. These simulations show that prioritized generalized pairwise comparison methods perform very similarly, are sensitive to the priority rank of the components in the composite endpoint, and do not measure the true treatment effect from the second priority-ranked component onward. The nonprioritized pairwise comparison test does not suffer from these limitations and correlation affects only its variance.  相似文献   

13.
In clinical trials with time‐to‐event outcomes, it is common to estimate the marginal hazard ratio from the proportional hazards model, even when the proportional hazards assumption is not valid. This is unavoidable from the perspective that the estimator must be specified a priori if probability statements about treatment effect estimates are desired. Marginal hazard ratio estimates under non‐proportional hazards are still useful, as they can be considered to be average treatment effect estimates over the support of the data. However, as many have shown, under non‐proportional hazard, the ‘usual’ unweighted marginal hazard ratio estimate is a function of the censoring distribution, which is not normally considered to be scientifically relevant when describing the treatment effect. In addition, in many practical settings, the censoring distribution is only conditionally independent (e.g., differing across treatment arms), which further complicates the interpretation. In this paper, we investigate an estimator of the hazard ratio that removes the influence of censoring and propose a consistent robust variance estimator. We compare the coverage probability of the estimator to both the usual Cox model estimator and an estimator proposed by Xu and O'Quigley (2000) when censoring is independent of the covariate. The new estimator should be used for inference that does not depend on the censoring distribution. It is particularly relevant to adaptive clinical trials where, by design, censoring distributions differ across treatment arms. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
Time-to-event analysis is frequently used in medical research to investigate potential diseasemodifying treatments in neurodegenerative diseases. Potential treatment effects are generally evaluated using the logrank test, which has optimal power and sensitivity when the treatment effect (hazard ratio) is constant over time. However, there is generally no prior information as to how the hazard ratio for the event of interest actually evolves. In these cases, the logrank test is not necessarily the most appropriate to use. When the hazard ratio is expected to decrease or increase over time, alternative statistical tests such as the Fleming-Harrington test, provide a better sensitivity. An example of this comes from a large, five-year randomised, placebo-controlled prevention trial (GuidAge) in 2854 community-based subjects making spontaneous memory complaints to their family physicians, which evaluated whether treatment with EGb761® can modify the risk of developing AD. The primary outcome measure was the time to conversion from memory complaint to Alzheimer’s type dementia. Although there was no significant difference in the hazard function of conversion between the two treatment groups according to the preplanned logrank test, a significant treatment-by-time interaction for the incidence of AD was observed in a protocol-specified subgroup analysis, suggesting that the hazard ratio is not constant over time. For this reason, additional post hoc analyses were performed using the Fleming-Harrington test to evaluate whether there was a signal of a late effect of EGb761®. Applying the Fleming-Harrington test, the hazard function for conversion to dementia in the placebo group was significantly different from that in the EGb761® treatment group (p = 0.0054), suggesting a late effect of EGb761®. Since this was a post hoc analysis, no definitive conclusions can be drawn as to the effectiveness of the treatment. This post hoc analysis illustrates the interest of performing another randomised clinical trial of EGb761® explicitly testing the hypothesis of a late treatment effect, as well as of using of better adapted statistical approaches for long term preventive trials when it is expected that prevention cannot have an immediate effect but rather a delayed effect that increases over time.  相似文献   

15.
The problem of testing for a centre effect in multi-centre studies following a proportional hazards regression analysis is considered. Two approaches to the problem can be used. One fits a proportional hazards model with a fixed covariate included for each centre (except one). The need for a centre specific adjustment is evaluated using either a score, Wald or likelihood ratio test of the hypothesis that all the centre specific effects are equal to zero. An alternative approach is to introduce a random effect or frailty for each centre into the model. Recently, Commenges and Andersen have proposed a score test for this random effects model. By a Monte Carlo study we compare the performance of these two approaches when either the fixed or random effects model holds true. The study shows that for moderate samples the fixed effects tests have nominal levels much higher than specified, but the random effect test performs as expected under the null hypothesis. Under the alternative hypothesis the random effect test has good power to detect relatively small fixed or random centre effects. Also, if the centre effect is ignored the estimator of the main treatment effect may be quite biased and is inconsistent. The tests are illustrated on a retrospective multi-centre study of recovery from bone marrow transplantation.  相似文献   

16.
We have derived the variance of an expected utility for a probability tree in medical decision analysis based on a Taylor series approximation of the expected utility as a function of the probability and utility values used in the decision tree. The resulting variance estimate is an algebraic expression of the variances associated with the probability and utility estimates used. We also derive expressions for the case where the input parameter estimates are not independent. We discuss the choice of input parameters and their variance estimates and give an example that compares two protocols for the treatment of chlamydial infection.  相似文献   

17.
In clinical trials, the study sample size is often chosen to provide specific power at a single point of a treatment difference. When this treatment difference is not close to the true one, the actual power of the trial can deviate from the specified power. To address this issue, we consider obtaining a flexible sample size design that provides sufficient power and has close to the 'ideal' sample size over possible values of the true treatment difference within an interval. A performance score is proposed to assess the overall performance of these flexible sample size designs. Its application to the determination of the best solution among considered candidate sample size designs is discussed and illustrated through computer simulations.  相似文献   

18.
In the planning of a clinical trial to compare the proportion of responses to two treatments one determines the sample size to yield the desired power of achieving a significant difference at a pre-selected type I error under the assumption that the expected treatment difference exceeds a prescribed minimum. To achieve a practicable sample size, an investigator may be tempted to require a large treatment difference and thereby risk the chance of missing a somewhat smaller but clinically important difference. We obtain in this paper an expression for this loss of power if the true treatment difference is smaller than the minimum used to plan the study. The expression does not explicitly depend on the sample size and for most practical purposes does not depend on the actual response rates but rather varies only as a function of the fractional difference between the required minimum and true treatment difference. With provision of a prior distribution for this fractional difference, we can use the expression to calculate an expected power for the study. An illustration considers the case where the prior follows a beta distribution.  相似文献   

19.
Multiple testing has been widely adopted for genome-wide studies such as microarray experiments. To improve the power of multiple testing, Storey (J. Royal Statist. Soc. B 2007; 69: 347-368) recently developed the optimal discovery procedure (ODP) which maximizes the number of expected true positives for each fixed number of expected false positives. However, in applying the ODP, we must estimate the true status of each significance test (null or alternative) and the true probability distribution corresponding to each test. In this article, we derive the ODP under hierarchical, random effects models and develop an empirical Bayes estimation method for the derived ODP. Our methods can effectively circumvent the estimation problems in applying the ODP presented by Storey. Simulations and applications to clinical studies of leukemia and breast cancer demonstrated that our empirical Bayes method achieved theoretical optimality and performed well in comparison with existing multiple testing procedures.  相似文献   

20.
To estimate the association of antiretroviral therapy initiation with incident acquired immunodeficiency syndrome (AIDS) or death while accounting for time-varying confounding in a cost-efficient manner, the authors combined a case-cohort study design with inverse probability-weighted estimation of a marginal structural Cox proportional hazards model. A total of 950 adults who were positive for human immunodeficiency virus type 1 were followed in 2 US cohort studies between 1995 and 2007. In the full cohort, 211 AIDS cases or deaths occurred during 4,456 person-years. In an illustrative 20% random subcohort of 190 participants, 41 AIDS cases or deaths occurred during 861 person-years. Accounting for measured confounders and determinants of dropout by inverse probability weighting, the full cohort hazard ratio was 0.41 (95% confidence interval: 0.26, 0.65) and the case-cohort hazard ratio was 0.47 (95% confidence interval: 0.26, 0.83). Standard multivariable-adjusted hazard ratios were closer to the null, regardless of study design. The precision lost with the case-cohort design was modest given the cost savings. Results from Monte Carlo simulations demonstrated that the proposed approach yields approximately unbiased estimates of the hazard ratio with appropriate confidence interval coverage. Marginal structural model analysis of case-cohort study designs provides a cost-efficient design coupled with an accurate analytic method for research settings in which there is time-varying confounding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号