首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The generalized Wilcoxon and log‐rank tests are commonly used for testing differences between two survival distributions. We modify the Wilcoxon test to account for auxiliary information on intermediate disease states that subjects may pass through before failure. For a disease with multiple states where patients are monitored periodically but exact transition times are unknown (e.g. staging in cancer), we first fit a multi‐state Markov model to the full data set; when censoring precludes the comparison of survival times between two subjects, we use the model to estimate the probability that one subject will have survived longer than the other given their censoring times and last observed status, and use these probabilities to compute an expected rank for each subject. These expected ranks form the basis of our test statistic. Simulations demonstrate that the proposed test can improve power over the log‐rank and generalized Wilcoxon tests in some settings while maintaining the nominal type 1 error rate. The method is illustrated on an amyotrophic lateral sclerosis data set. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
In clinical trials with survival endpoint, it is common to observe an overlap between two Kaplan–Meier curves of treatment and control groups during the early stage of the trials, indicating a potential delayed treatment effect. Formulas have been derived for the asymptotic power of the log‐rank test in the presence of delayed treatment effect and its accompanying sample size calculation. In this paper, we first reformulate the alternative hypothesis with the delayed treatment effect in a rescaled time domain, which can yield a simplified sample size formula for the log‐rank test in this context. We further propose an intersection‐union test to examine the efficacy of treatment with delayed effect and show it to be more powerful than the log‐rank test. Simulation studies are conducted to demonstrate the proposed methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
In clustered survival data, subunits within each cluster share similar characteristics, so that observations made from them tend to be positively correlated. In clinical trials, the correlated subunits from the same cluster are often randomized to different treatment groups. In this case, the variance formulas of the standard rank tests such as the logrank, Gehan-Wilcoxon or Prentice-Wilcoxon, proposed for independent samples, need to be adjusted for intracluster correlations both within and between treatment groups for testing equality of marginal survival distributions. In this paper we derive a general form of simple variance formulas of the rank tests when subunits from the same cluster are randomized into different treatment groups. Extensive simulation studies are conducted to investigate small sample performance of the variance formulas. We compare our non-parametric rank tests based on the adjusted variances with one from a shared frailty model, which is an optimal semi-parametric testing procedure when the intracluster correlations within and between groups are the same.  相似文献   

4.
The log‐rank test is the most powerful non‐parametric test for detecting a proportional hazards alternative and thus is the most commonly used testing procedure for comparing time‐to‐event distributions between different treatments in clinical trials. When the log‐rank test is used for the primary data analysis, the sample size calculation should also be based on the test to ensure the desired power for the study. In some clinical trials, the treatment effect may not manifest itself right after patients receive the treatment. Therefore, the proportional hazards assumption may not hold. Furthermore, patients may discontinue the study treatment prematurely and thus may have diluted treatment effect after treatment discontinuation. If a patient's treatment termination time is independent of his/her time‐to‐event of interest, the termination time can be treated as a censoring time in the final data analysis. Alternatively, we may keep collecting time‐to‐event data until study termination from those patients who discontinued the treatment and conduct an intent‐to‐treat analysis by including them in the original treatment groups. We derive formulas necessary to calculate the asymptotic power of the log‐rank test under this non‐proportional hazards alternative for the two data analysis strategies. Simulation studies indicate that the formulas provide accurate power for a variety of trial settings. A clinical trial example is used to illustrate the application of the proposed methods. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
An improved method of sample size calculation for the one‐sample log‐rank test is provided. The one‐sample log‐rank test may be the method of choice if the survival curve of a single treatment group is to be compared with that of a historic control. Such settings arise, for example, in clinical phase‐II trials if the response to a new treatment is measured by a survival endpoint. Present sample size formulas for the one‐sample log‐rank test are based on the number of events to be observed, that is, in order to achieve approximately a desired power for allocated significance level and effect the trial is stopped as soon as a certain critical number of events are reached. We propose a new stopping criterion to be followed. Both approaches are shown to be asymptotically equivalent. For small sample size, though, a simulation study indicates that the new criterion might be preferred when planning a corresponding trial. In our simulations, the trial is usually underpowered, and the aspired significance level is not exploited if the traditional stopping criterion based on the number of events is used, whereas a trial based on the new stopping criterion maintains power with the type‐I error rate still controlled. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

6.
Arming the immune system against cancer has emerged as a powerful tool in oncology during recent years. Instead of poisoning a tumor or destroying it with radiation, therapeutic cancer vaccine, a type of cancer immunotherapy, unleashes the immune system to combat cancer. This indirect mechanism‐of‐action of vaccines poses the possibility of a delayed onset of clinical effect, which results in a delayed separation of survival curves between the experimental and control groups in therapeutic cancer vaccine trials with time‐to‐event endpoints. This violates the proportional hazard assumption. As a result, the conventional study design based on the regular log‐rank test ignoring the delayed effect would lead to a loss of power. In this paper, we propose two innovative approaches for sample size and power calculation using the piecewise weighted log‐rank test to properly and efficiently incorporate the delayed effect into the study design. Both theoretical derivations and empirical studies demonstrate that the proposed methods, accounting for the delayed effect, can reduce sample size dramatically while achieving the target power relative to a standard practice. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log‐rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non‐linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p‐value scale. However, instead of the conditional Monte‐Carlo approach used in conditional inference forests, p‐value approximations are employed. We describe several p‐value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade‐off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non‐linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p‐value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

8.
Confounding due to population substructure is always a concern in genetic association studies. Although methods have been proposed to adjust for population stratification in the context of common variation, it is unclear how well these approaches will work when interrogating rare variation. Family‐based association tests can be constructed that are robust to population stratification. For example, when considering a quantitative trait, a linear model can be used that decomposes genetic effects into between‐ and within‐family components and a test of the within‐family component is robust to population stratification. However, this within‐family test ignores between‐family information potentially leading to a loss of power. Here, we propose a family‐based two‐stage rare‐variant test for quantitative traits. We first construct a weight for each variant within a gene, or other genetic unit, based on score tests of between‐family effect parameters. These weights are then used to combine variants using score tests of within‐family effect parameters. Because the between‐family and within‐family tests are orthogonal under the null hypothesis, this two‐stage approach can increase power while still maintaining validity. Using simulation, we show that this two‐stage test can significantly improve power while correctly maintaining type I error. We further show that the two‐stage approach maintains the robustness to population stratification of the within‐family test and we illustrate this using simulations reflecting samples composed of continental and closely related subpopulations.  相似文献   

9.
Clustered right‐censored data often arise from tumorigenicity experiments and clinical trials. For testing the equality of two survival functions, Jung and Jeong extended weighted logrank (WLR) tests to two independent samples of clustered right‐censored data, while the weighted Kaplan–Meier (WKM) test can be derived from the work of O'Gorman and Akritas. The weight functions in both classes of tests (WLR and WKM) can be selected to be more sensitive to detect a certain alternative; however, since the exact alternative is unknown, it is difficult to specify the selected weights in advance. Since WLR is rank‐based, it is not sensitive to the magnitude of the difference in survival times. Although WKM is constructed to be more sensitive to the magnitude of the difference in survival times, it is not sensitive to late hazard differences. Therefore, in order to combine the advantages of these two classes of tests, this paper developed a class of versatile tests based on simultaneously using WLR and WKM for two independent samples of clustered right‐censored data. The comparative results from a simulation study are presented and the implementation of the versatile tests to two real data sets is illustrated. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
Two‐period two‐treatment (2×2) crossover designs are commonly used in clinical trials. For continuous endpoints, it has been shown that baseline (pretreatment) measurements collected before the start of each treatment period can be useful in improving the power of the analysis. Methods to achieve a corresponding gain for censored time‐to‐event endpoints have not been adequately studied. We propose a method in which censored values are treated as missing data and multiply imputed using prespecified parametric event time models. The event times in each imputed data set are then log‐transformed and analyzed using a linear model suitable for a 2×2 crossover design with continuous endpoints, with the difference in period‐specific baselines included as a covariate. Results obtained from the imputed data sets are synthesized for point and confidence interval estimation of the treatment ratio of geometric mean event times using model averaging in conjunction with Rubin's combination rule. We use simulations to illustrate the favorable operating characteristics of our method relative to two other methods for crossover trials with censored time‐to‐event data, ie, a hierarchical rank test that ignores the baselines and a stratified Cox model that uses each study subject as a stratum and includes period‐specific baselines as a covariate. Application to a real data example is provided.  相似文献   

11.
Where treatments are administered to groups of patients or delivered by therapists, outcomes for patients in the same group or treated by the same therapist may be more similar, leading to clustering. Trials of such treatments should take account of this effect. Where such a treatment is compared with an un‐clustered treatment, the trial has a partially nested design. This paper compares statistical methods for this design where the outcome is binary. Investigation of consistency reveals that a random coefficient model with a random effect for group or therapist is not consistent with other methods for a null treatment effect, and so this model is not recommended for this design. Small sample performance of a cluster‐adjusted test of proportions, a summary measures test and logistic generalised estimating equations and random intercept models are investigated through simulation. The expected treatment effect is biased for the logistic models. Empirical test size of two‐sided tests is raised only slightly, but there are substantial biases for one‐sided tests. Three formulae are proposed for calculating sample size and power based on (i) the difference of proportions, (ii) the log‐odds ratio or (iii) the arc‐sine transformation of proportions. Calculated power from these formulae is compared with empirical power from a simulations study. Logistic models appeared to perform better than those based on proportions with the likelihood ratio test performing best in the range of scenarios considered. For these analyses, the log‐odds ratio method of calculation of power gave an approximate lower limit for empirical power. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

12.
The power of a chi‐square test, and thus the required sample size, are a function of the noncentrality parameter that can be obtained as the limiting expectation of the test statistic under an alternative hypothesis specification. Herein, we apply this principle to derive simple expressions for two tests that are commonly applied to discrete ordinal data. The Wilcoxon rank sum test for the equality of distributions in two groups is algebraically equivalent to the Mann–Whitney test. The Kruskal–Wallis test applies to multiple groups. These tests are equivalent to a Cochran–Mantel–Haenszel mean score test using rank scores for a set of C‐discrete categories. Although various authors have assessed the power function of the Wilcoxon and Mann–Whitney tests, herein it is shown that the power of these tests with discrete observations, that is, with tied ranks, is readily provided by the power function of the corresponding Cochran–Mantel–Haenszel mean scores test for two and R > 2 groups. These expressions yield results virtually identical to those derived previously for rank scores and also apply to other score functions. The Cochran–Armitage test for trend assesses whether there is an monotonically increasing or decreasing trend in the proportions with a positive outcome or response over the C‐ordered categories of an ordinal independent variable, for example, dose. Herein, it is shown that the power of the test is a function of the slope of the response probabilities over the ordinal scores assigned to the groups that yields simple expressions for the power of the test. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
M Lu  B C Tilley 《Statistics in medicine》2001,20(13):1891-1901
In clinical trials, when a single outcome is not sufficient to describe the underlying concept of interest, it may be necessary to compare treatment groups on multiple correlated outcomes. A global test based on a logit link function provides an estimate of the odds ratio for assessing a common treatment effect among correlated binary outcomes. In this paper we extend the use of generalized estimating equations (GEE) to calculate a common relative risk from correlated binary outcomes based on a log link function. In the context of global tests, we discuss the equivalence and difference between logit and log links and their estimates. We also derive a formula for calculating a common risk difference between two treatment groups based on multiple correlated binary outcomes with categorical covariates, assuming the asymptotic equivalency between the logit and log-linear links. We discuss the statistical tools to be used in choosing between the logit and log links when models on different links yield contrasting results. Examples using data from the NINDS t-PA Stroke Trials are provided. We conclude, in a study of correlated binary outcomes, that the choice of the logit or log link could be based on a comparison of goodness-of-link.  相似文献   

14.

Background

Although many researchers in the field of health economics and quality of care compare the length of stay (LOS) in two inpatient samples, they often fail to check whether the sample meets the assumptions made by their chosen statistical test. In fact, LOS data show a highly right-skewed, discrete distribution in which most of the observations are tied; this violates the assumptions of most statistical tests.

Objectives

To estimate the type I and type II errors associated with the application of 12 different statistical tests to a series of LOS samples.

Methods

The LOS distribution was extracted from an exhaustive French national database of inpatient stays. The type I error was estimated using 19 sample sizes and 1,000,000 simulations per sample. The type II error was estimated in three alternative scenarios. For each test, the type I and type II errors were plotted as a function of the sample size.

Results

Gamma regression with log link, the log rank test, median regression, Poisson regression, and Weibull survival analysis presented an unacceptably high type I error. In contrast, the Student standard t test, linear regression with log link, and the Cox models had an acceptable type I error but low power.

Conclusions

When comparing the LOS for two balanced inpatient samples, the Student t test with logarithmic or rank transformation, the Wilcoxon test, and the Kruskal-Wallis test are the only methods with an acceptable type I error and high power.  相似文献   

15.
Our aim is to develop a rich and coherent framework for modeling correlated time‐to‐event data, including (1) survival regression models with different links and (2) flexible modeling for time‐dependent and nonlinear effects with rich postestimation. We extend the class of generalized survival models, which expresses a transformed survival in terms of a linear predictor, by incorporating a shared frailty or random effects for correlated survival data. The proposed approach can include parametric or penalized smooth functions for time, time‐dependent effects, nonlinear effects, and their interactions. The maximum (penalized) marginal likelihood method is used to estimate the regression coefficients and the variance for the frailty or random effects. The optimal smoothing parameters for the penalized marginal likelihood estimation can be automatically selected by a likelihood‐based cross‐validation criterion. For models with normal random effects, Gauss‐Hermite quadrature can be used to obtain the cluster‐level marginal likelihoods. The Akaike Information Criterion can be used to compare models and select the link function. We have implemented these methods in the R package rstpm2. Simulating for both small and larger clusters, we find that this approach performs well. Through 2 applications, we demonstrate (1) a comparison of proportional hazards and proportional odds models with random effects for clustered survival data and (2) the estimation of time‐varying effects on the log‐time scale, age‐varying effects for a specific treatment, and two‐dimensional splines for time and age.  相似文献   

16.
In clinical trials with time‐to‐event endpoints, it is not uncommon to see a significant proportion of patients being cured (or long‐term survivors), such as trials for the non‐Hodgkins lymphoma disease. The popularly used sample size formula derived under the proportional hazards (PH) model may not be proper to design a survival trial with a cure fraction, because the PH model assumption may be violated. To account for a cure fraction, the PH cure model is widely used in practice, where a PH model is used for survival times of uncured patients and a logistic distribution is used for the probability of patients being cured. In this paper, we develop a sample size formula on the basis of the PH cure model by investigating the asymptotic distributions of the standard weighted log‐rank statistics under the null and local alternative hypotheses. The derived sample size formula under the PH cure model is more flexible because it can be used to test the differences in the short‐term survival and/or cure fraction. Furthermore, we also investigate as numerical examples the impacts of accrual methods and durations of accrual and follow‐up periods on sample size calculation. The results show that ignoring the cure rate in sample size calculation can lead to either underpowered or overpowered studies. We evaluate the performance of the proposed formula by simulation studies and provide an example to illustrate its application with the use of data from a melanoma trial. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
Preclinical evaluation of candidate human immunodeficiency virus (HIV) vaccines entails challenge studies whereby non‐human primates such as macaques are vaccinated with either an active or control vaccine and then challenged (exposed) with a simian‐version of HIV. Repeated low‐dose challenge (RLC) studies in which each macaque is challenged multiple times (either until infection or some maximum number of challenges is reached) are becoming more common in an effort to mimic natural exposure to HIV in humans. Statistical methods typically employed for the testing for a vaccine effect in RLC studies include a modified version of Fisher's exact test as well as large sample approaches such as the usual log‐rank test. Unfortunately, these methods are not guaranteed to provide a valid test for the effect of vaccination. On the other hand, valid tests for vaccine effect such as the exact log‐rank test may not be easy to implement using software available to many researchers. This paper details which statistical approaches are appropriate for the analysis of RLC studies, and how to implement these methods easily in SAS or R. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
Frequently in clinical studies a primary outcome is formulated from a vector of binary events. Several methods exist to assess treatment effects on multiple correlated binary outcomes, including comparing groups on the occurrence of at least one among the outcomes (‘collapsed composite’), on the count of outcomes observed per subject, on individual outcomes adjusting for multiplicity, or with multivariate tests postulating either common or distinct effects across outcomes. We focus on a 1‐df distinct effects test in which the estimated outcome‐specific treatment effects from a GEE model are simply averaged, and compare it with other methods on clinical and statistical grounds. Using a flexible method to simulate multivariate binary data, we show that the relative efficiencies of the assessed tests depend in a complex way on the magnitudes and variabilities of component incidences and treatment effects, as well as correlations among component events. While other tests are easily ‘driven’ by high‐frequency components, the average effect GEE test is not, since it averages the log odds ratios unweighted by the component frequencies. Thus, the average effect test is relatively more powerful than other tests when lower frequency components have stronger associations with a treatment or other predictor, but less powerful when higher frequency components are more strongly associated. In studies when relative effects are at least as important as absolute effects, or when lower frequency components are clinically most important, this test may be preferred. Two clinical trials are discussed and analyzed, and recommendations for practice are made. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
When the one‐sample or two‐sample t‐test is either taught in the class room or applied in practice to small samples, there is considerable divergence of opinion as to whether or not the inferences drawn are valid. Many point to the ‘Robustness’ of the t‐test to violations of assumptions, while others use rank or other robust methods because they believe that the t‐test is not robust against violations of such assumptions. It is quite likely, despite the apparent divergence of these two opinions, that both arguments have considerable merit. If we agree that this question cannot possibly be resolved in general, the issue becomes one of determining, before any actual data have been collected, whether the t‐test will or will not be robust in a specific application. This paper describes statistical analysis system software, covering a large collection of potential input probability distributions, to investigate both the null and power properties of various one‐ and two‐sample t‐tests and their normal approximations, as well as the Wilcoxon two‐sample and sign‐rank one‐sample tests, allowing potential practitioners to determine, at the study design stage, whether the t‐test will be robust in their specific application. Sample size projections, based on these actual distributions, are also included. This paper is not intended as a tool to assess robustness after the data have been collected. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

20.
Comparing two samples with a continuous non‐negative score, e.g. a utility score over [0, 1], with a substantial proportion, say 50 per cent, scoring 0 presents distributional problems for most standard tests. A Wilcoxon rank test can be used, but the large number of ties reduces power. I propose a new test, the Wilcoxon rank‐sum test performed after removing an equal (and maximal) number of 0's from each sample. This test recovers much of the power. Compared with a (directional) modification of a two‐part test proposed by Lachenbruch, the truncated Wilcoxon has similar power when the non‐zero scores are independent of the proportion of zeros, but, unlike the two‐part test, the truncated Wilcoxon is relatively unaffected when these processes are dependent. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号