共查询到20条相似文献,搜索用时 10 毫秒
1.
There are many different proposed procedures for sample size planning for the Wilcoxon-Mann-Whitney test at given type-I and type-II error rates α and β, respectively. Most methods assume very specific models or types of data to simplify calculations (eg, ordered categorical or metric data, location shift alternatives, etc). We present a unified approach that covers metric data with and without ties, count data, ordered categorical data, and even dichotomous data. For that, we calculate the unknown theoretical quantities such as the variances under the null and relevant alternative hypothesis by considering the following “synthetic data” approach. We evaluate data whose empirical distribution functions match the theoretical distribution functions involved in the computations of the unknown theoretical quantities. Then, well-known relations for the ranks of the data are used for the calculations. In addition to computing the necessary sample size N for a fixed allocation proportion t = n1/N, where n1 is the sample size in the first group and N = n1 + n2 is the total sample size, we provide an interval for the optimal allocation rate t, which minimizes the total sample size N. It turns out that, for certain distributions, a balanced design is optimal. We give a characterization of such distributions. Furthermore, we show that the optimal choice of t depends on the ratio of the two variances, which determine the variance of the Wilcoxon-Mann-Whitney statistic under the alternative. This is different from an optimal sample size allocation in case of the normal distribution model. 相似文献
2.
Health-related quality of life (HRQoL) measures are increasingly used in trials as primary outcome measures. Investigators are now asking statisticians for advice on how to plan and analyse studies using such outcomes. HRQoL outcomes, like the SF-36, are usual measured on an ordinal scale, although most investigators assume that there exists an underlying continuous latent variable and that the actual measured outcomes (the ordered categories) reflect contiguous intervals along this continuum. The ordinal scaling of HRQoL measures means they tend to generate data that have discrete, bounded and skewed distributions. Thus, standard methods of analysis that assume Normality and constant variance may not be appropriate. For this reason, conventional statistical advice would suggest non-parametric methods be used to analyse HRQoL data. The bootstrap is one such computer intensive non-parametric method for estimating sample sizes and analysing data.We describe three methods of estimating sample sizes for two-group cross-sectional comparisons of HRQoL outcomes. We then compared the power of the three methods for a two-group cross-sectional study design using bootstrap simulation. The results showed that under the location shift alternative hypothesis, conventional methods of sample size estimation performed well, particularly Whitehead's method. Whitehead's method is recommended if the HRQoL outcome has a limited number of discrete values (<7) and/or the expected proportion of cases at either of the bounds is high. If a pilot data set is readily available then bootstrap simulation will provide a more accurate and reliable estimate, than conventional methods.Finally, we used the bootstrap for hypothesis testing and the estimation of standard errors and confidence intervals for parameters, in an example data set. We then compared and contrasted the bootstrap with standard methods of analysing HRQoL outcomes. In the data set studied, with the SF-36 outcome, the use of the bootstrap for estimating sample sizes and analysing HRQoL data produces results similar to conventional statistical methods. These results suggest that bootstrap methods are not more appropriate for analysing HRQoL outcome data than standard methods. 相似文献
3.
O'Brien (Biometrics 1984; 40:1079-1087) introduced a rank-sum-type global statistical test to summarize treatment's effect on multiple outcomes and to determine whether a treatment is better than others. This paper presents a sample size computation method for clinical trial design with multiple primary outcomes, and O'Brien's test or its modified test (Biometrics 2005; 61:532-539) is used for the primary analysis. A new measure, the global treatment effect (GTE), is introduced to summarize treatment's efficacy from multiple primary outcomes. Computation of the GTE under various settings is provided. Sample size methods are presented based on prespecified GTE both when pilot data are available and when no pilot data are available. The optimal randomization ratio is given for both cases. We compare our sample size method with the Bonferroni adjustment for multiple tests. Since ranks are used in our derivation, sample size formulas derived here are invariant to any monotone transformation of the data and are robust to outliers and skewed distributions. When all outcomes are binary, we show how sample size is affected by the success probabilities of outcomes. Simulation shows that these sample size formulas provide good control of type I error and statistical power. An application to a Parkinson's disease clinical trial design is demonstrated. Splus codes to compute sample size and the test statistic are provided. 相似文献
4.
Fosgate GT 《Statistics in medicine》2005,24(18):2857-2866
The design of epidemiologic studies for the validation of diagnostic tests necessitates accurate sample size calculations to allow for the estimation of diagnostic sensitivity and specificity within a specified level of precision and with the desired level of confidence. Confidence intervals based on the normal approximation to the binomial do not achieve the specified coverage when the proportion is close to 1. A sample size algorithm based on the exact mid-P method of confidence interval estimation was developed to address the limitations of normal approximation methods. This algorithm resulted in sample sizes that achieved the appropriate confidence interval width even in situations when normal approximation methods performed poorly. 相似文献
5.
F Y Hsieh 《Statistics in medicine》1987,6(5):577-581
This paper presents a simple method of calculating sample sizes for unequal-sample-size designs with use of published tables applicable to equal-sample-size design. The method applies to both the logrank test and the t-test. For the power of logrank test, this paper compares the proposed method with existing methods and with the Monte Carlo simulation. 相似文献
6.
Diagnostic tests rarely provide perfect results. The misclassification induced by imperfect sensitivities and specificities of diagnostic tests must be accounted for when planning prevalence studies or investigations into properties of new tests. The previous work has shown that applying a single imperfect test to estimate prevalence can often result in very large sample size requirements, and that sometimes even an infinite sample size is insufficient for precise estimation because the problem is non‐identifiable. Adding a second test can sometimes reduce the sample size substantially, but infinite sample sizes can still occur as the problem remains non‐identifiable. We investigate the further improvement possible when three diagnostic tests are to be applied. We first develop methods required for studies when three conditionally independent tests are available, using different Bayesian criteria. We then apply these criteria to prototypic scenarios, showing that large sample size reductions can occur compared to when only one or two tests are used. As the problem is now identifiable, infinite sample sizes cannot occur except in pathological situations. Finally, we relax the conditional independence assumption, demonstrating in this once again non‐identifiable situation that sample sizes may substantially grow and possibly be infinite. We apply our methods to the planning of two infectious disease studies, the first designed to estimate the prevalence of Strongyloides infection, and the second relating to estimating the sensitivity of a new test for tuberculosis transmission. The much smaller sample sizes that are typically required when three as compared to one or two tests are used should encourage researchers to plan their studies using more than two diagnostic tests whenever possible. User‐friendly software is available for both design and analysis stages greatly facilitating the use of these methods. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献
7.
Power and sample size for DNA microarray studies 总被引:10,自引:0,他引:10
A microarray study aims at having a high probability of declaring genes to be differentially expressed if they are truly expressed, while keeping the probability of making false declarations of expression acceptably low. Thus, in formal terms, well-designed microarray studies will have high power while controlling type I error risk. Achieving this objective is the purpose of this paper. Here, we discuss conceptual issues and present computational methods for statistical power and sample size in microarray studies, taking account of the multiple testing that is generic to these studies. The discussion encompasses choices of experimental design and replication for a study. Practical examples are used to demonstrate the methods. The examples show forcefully that replication of a microarray experiment can yield large increases in statistical power. The paper refers to cDNA arrays in the discussion and illustrations but the proposed methodology is equally applicable to expression data from oligonucleotide arrays. 相似文献
8.
René Schmidt Robert Kwiecien Andreas Faldum Frank Berthold Barbara Hero Sandra Ligges 《Statistics in medicine》2015,34(6):1031-1040
An improved method of sample size calculation for the one‐sample log‐rank test is provided. The one‐sample log‐rank test may be the method of choice if the survival curve of a single treatment group is to be compared with that of a historic control. Such settings arise, for example, in clinical phase‐II trials if the response to a new treatment is measured by a survival endpoint. Present sample size formulas for the one‐sample log‐rank test are based on the number of events to be observed, that is, in order to achieve approximately a desired power for allocated significance level and effect the trial is stopped as soon as a certain critical number of events are reached. We propose a new stopping criterion to be followed. Both approaches are shown to be asymptotically equivalent. For small sample size, though, a simulation study indicates that the new criterion might be preferred when planning a corresponding trial. In our simulations, the trial is usually underpowered, and the aspired significance level is not exploited if the traditional stopping criterion based on the number of events is used, whereas a trial based on the new stopping criterion maintains power with the type‐I error rate still controlled. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
9.
The need for statistical methodologies for analysing a small size study, such as a pilot or so-called 'proof of concept' study, has not been paid much attention in the past. Recently the Institute of Medicine (IOM) formed a committee and held a workshop to discuss methodologies for conducting clinical trials with small number participants. In this paper we argue that the hypothesis of treatment effect in a small pilot study should be set up to test whether any individual subject has an effect rather than whether the group mean or median has shifted as often done for large, confirmatory clinical trials. Based on this paradigm we propose multiple test procedures as one option when individuals have enough observations, and a mixture-distribution approach when individuals have one or more observations. The latter approach may be used in either a one- or two-group setting, and is our focus in this paper. We present the likelihood ratio tests for the mixture models. Examples are given to demonstrate the methods. 相似文献
10.
Recently, Stewart and Ruberg proposed the use of contrast tests for detecting dose-response relationships. They considered in particular bivariate contrasts for healing rates and gave several possibilities of defining adequate sets of coefficients. This paper extends their work in several directions. First, asymptotic power expressions for both single and multiple contrast tests are derived. Secondly, well known trend tests are rewritten as multiple contrast tests, thus alleviating the inherent problem of choosing adequate contrast coefficients. Thirdly, recent results on the efficient calculation of multivariate normal probabilities overcome the traditional simulation-based methods for the numerical computations. Modifications of the power formulae allow the calculation of sample sizes for given type I and II errors, the spontaneous rate, and the dose-response shape. Some numerical results of a power study for small to moderate sample sizes show that the nominal power is a reasonably good approximation to the actual power. An example from a clinical trial illustrates the practical use of the results. 相似文献
11.
Zhao YD 《Statistics in medicine》2006,25(15):2675-2687
The van Elteren test is a type of stratified Wilcoxon-Mann-Whitney test for comparing two treatments accounting for strata. In this paper, we study sample size estimation methods for the asymptotic version of the van Elteren test, assuming that the stratum fractions (ratios of each stratum size to the total sample size) and the treatment fractions (ratios of each treatment size to the stratum size) are known in the study design. In particular, we develop three large-sample sample size estimation methods and present a real data example to illustrate the necessary information in the study design phase in order to apply the methods. Simulation studies are conducted to compare the performance of the methods and recommendations are made for method choice. Finally, sample size estimation for the van Elteren test when the stratum fractions are unknown is also discussed. 相似文献
12.
High‐throughput sequencing technologies have enabled large‐scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression‐based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray‐Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance‐component score test. Because most of the current microbiome studies have small sample sizes, a novel small‐sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html . 相似文献
13.
Studies involving two methods for measuring a continuous response are regularly conducted in health sciences to evaluate agreement of a method with itself and agreement between methods. Notwithstanding their wide usage, the design of such studies, in particular, the sample size determination, has not been addressed in the literature when the goal is the simultaneous evaluation of intra- and inter-method agreement. We fill this need by developing a simulation-based Bayesian methodology for determining sample sizes in a hierarchical model framework. Unlike a frequentist approach, it takes into account uncertainty in parameter estimates. This methodology can be used with any scalar measure of agreement available in the literature. We demonstrate this for four currently used measures. The proposed method is applied to an ongoing proteomics project, where we use pilot data to determine the number of individuals and the number of replications needed to evaluate the agreement between two methods for measuring protein ratios. We also apply our method to determine the sample size for an experiment involving measurement of blood pressure. 相似文献
14.
The design of a study of disease screening tests may be based on hypothesis tests for the sensitivity and specificity of the tests. The case-control study requires knowledge of the disease status of patients at the time of enrollment. This may not be possible in a prospective setting, when the gold standard is obtained subsequent to the initial screening and the number of diseased individuals is random and can not be fixed by design. Several ad hoc procedures for determining the total sample size are commonly used by practitioners, for example, the prevalence inflation method. The properties of these methods are not well understood. We develop a formal method for sample size and power calculations based on the unconditional power properties of the test statistics. The approach provides novel insights into the behaviour of the commonly used methods. We find that the ad hoc prevalence inflation method may serve as a useful approximation to our rigorous framework for sample size determination in the prospective set-up. The design of a large population-based study of mammography for breast cancer screening illustrates the key issues. 相似文献
15.
Girardeau, Ravaud and Donner in 2008 presented a formula for sample size calculations for cluster randomised crossover trials, when the intracluster correlation coefficient, interperiod correlation coefficient and mean cluster size are specified in advance. However, in many randomised trials, the number of clusters is constrained in some way, but the mean cluster size is not. We present a version of the Girardeau formula for sample size calculations for cluster randomised crossover trials when the number of clusters is fixed. Formulae are given for the minimum number of clusters, the maximum cluster size and the relationship between the correlation coefficients when there are constraints on both the number of clusters and the cluster size. Our version of the formula may aid the efficient planning and design of cluster randomised crossover trials. 相似文献
16.
We have presented a new likelihood-based approach for constructing confidence intervals of effect size that are applicable to small samples. We also conduct a simulation study to compare the coverage probability of the new likelihood-based method with other three methods proposed by Hedges and Olkin and Kraemer and Paik. Simulation studies show that the confidence interval generated by the modified signed log-likelihood ratio method possesses essentially exact coverage probabilities even for small samples, although the coverage probabilities are consistently but slightly less than the nominal level. The methods are also applied to two examples. 相似文献
17.
Minimization of sample size when comparing two small probabilities in a non-inferiority safety trial
In clinical trials success rates of two treatments to be compared often range from 10 to 90 per cent. When the comparison probabilities are (much) smaller than 10 per cent, standard methods for sample size and power calculations may provide invalid results. This situation may occur when there is interest in safety rather than in efficacy. In such trials, no more patients should be included than strictly necessary. We compared the results of maximum likelihood methods for the computation of sample sizes in a non-inferiority trial, including exact procedures and considered unequal sample sizes for experimental and reference treatment. An exact, unequal sample size maximum likelihood procedure is advocated when the specified non-zero risk difference under the null hypothesis is not too large. Such a procedure is also indicated when the parameter of interest is the relative risk, rather than the risk difference. 相似文献
18.
Campbell I 《Statistics in medicine》2007,26(19):3661-3675
Two-by-two tables commonly arise in comparative trials and cross-sectional studies. In medical studies, two-by-two tables may have a small sample size due to the rarity of a condition, or to limited resources. Current recommendations on the appropriate statistical test mostly specify the chi-squared test for tables where the minimum expected number is at least 5 (following Fisher and Cochran), and otherwise the Fisher-Irwin test; but there is disagreement on which versions of the chi-squared and Fisher-Irwin tests should be used. A further uncertainty is that, according to Cochran, the number 5 was chosen arbitrarily. Computer-intensive techniques were used in this study to compare seven two-sided tests of two-by-two tables in terms of their Type I errors. The tests were K. Pearson's and Yates's chi-squared tests and the 'N-1' chi-squared test (first proposed by E. Pearson), together with four versions of the Fisher-Irwin test (including two mid-P versions). The optimum test policy was found to be analysis by the 'N-1' chi-squared test when the minimum expected number is at least 1, and otherwise, by the Fisher-Irwin test by Irwin's rule (taking the total probability of tables in either tail that are as likely as, or less likely than the one observed). This policy was found to have increased power compared to Cochran's recommendations. 相似文献
19.
In clinical trials with t-distributed test statistics the required sample size depends on the unknown variance. Taking estimates from previous studies often leads to a misspecification of the true value of the variance. Hence, re-estimation of the variance based on the collected data and re-calculation of the required sample size is attractive. We present a flexible method for extensions of fixed sample or group-sequential trials with t-distributed test statistics. The method can be applied at any time during the course of the trial and does not require the necessity to pre-specify a sample size re-calculation rule. All available information can be used to determine the new sample size. The advantage of our method when compared with other adaptive methods is maintenance of the efficient t-test design when no extensions are actually made. We show that the type I error rate is preserved. 相似文献
20.
Heller G 《Statistics in medicine》2006,25(15):2543-2553
Power calculations in a small sample comparative study, with a continuous outcome measure, are typically undertaken using the asymptotic distribution of the test statistic. When the sample size is small, this asymptotic result can be a poor approximation. An alternative approach, using a rank based test statistic, is an exact power calculation. When the number of groups is greater than two, the number of calculations required to perform an exact power calculation is prohibitive. To reduce the computational burden, a Monte Carlo resampling procedure is used to approximate the exact power function of a k-sample rank test statistic under the family of Lehmann alternative hypotheses. The motivating example for this approach is the design of animal studies, where the number of animals per group is typically small. 相似文献