首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tests for equivalence or non-inferiority for paired binary data.   总被引:7,自引:0,他引:7  
Assessment of therapeutic equivalence or non-inferiority between two medical diagnostic procedures often involves comparisons of the response rates between paired binary endpoints. The commonly used and accepted approach to assessing equivalence is by comparing the asymptotic confidence interval on the difference of two response rates with some clinical meaningful equivalence limits. This paper investigates two asymptotic test statistics, a Wald-type (sample-based) test statistic and a restricted maximum likelihood estimation (RMLE-based) test statistic, to assess equivalence or non-inferiority based on paired binary endpoints. The sample size and power functions of the two tests are derived. The actual type I error and power of the two tests are computed by enumerating the exact probabilities in the rejection region. The results show that the RMLE-based test controls type I error better than the sample-based test. To establish an equivalence between two treatments with a symmetric equivalence limit of 0.15, a minimal sample size of 120 is needed. The RMLE-based test without the continuity correction performs well at the boundary point 0. A numerical example illustrates the proposed procedures.  相似文献   

2.
This paper demonstrates an inflation of the type I error rate that occurs when testing the statistical significance of a continuous risk factor after adjusting for a correlated continuous confounding variable that has been divided into a categorical variable. We used Monte Carlo simulation methods to assess the inflation of the type I error rate when testing the statistical significance of a risk factor after adjusting for a continuous confounding variable that has been divided into categories. We found that the inflation of the type I error rate increases with increasing sample size, as the correlation between the risk factor and the confounding variable increases, and with a decrease in the number of categories into which the confounder is divided. Even when the confounder is divided in a five-level categorical variable, the inflation of the type I error rate remained high when both the sample size and the correlation between the risk factor and the confounder were high.  相似文献   

3.
We used theoretical and simulation‐based approaches to study Type I error rates for one‐stage and two‐stage analytic methods for cluster‐randomized designs. The one‐stage approach uses the observed data as outcomes and accounts for within‐cluster correlation using a general linear mixed model. The two‐stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one‐stage and two‐stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one‐stage and six two‐stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two‐stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one‐stage model with Kenward–Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one‐stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster‐randomized trials, the Kenward–Roger method is the preferred one‐stage approach. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

4.
The present study introduces new Haplotype Sharing Transmission/Disequilibrium Tests (HS-TDTs) that allow for random genotyping errors. We evaluate the type I error rate and power of the new proposed tests under a variety of scenarios and perform a power comparison among the proposed tests, the HS-TDT and the single-marker TDT. The results indicate that the HS-TDT shows a significant increase in type I error when applied to data in which either Mendelian inconsistent trios are removed or Mendelian inconsistent markers are treated as missing genotypes, and the magnitude of the type I error increases both with an increase in sample size and with an increase in genotyping error rate. The results also show that a simple strategy, that is, merging each rare haplotype to a most similar common haplotype, can control the type I error inflation for a wide range of genotyping error rates, and after merging rare haplotypes, the power of the test is very similar to that without merging the rare haplotypes. Therefore, we conclude that a simple strategy may make the HS-TDT robust to genotyping errors. Our simulation results also show that this strategy may also be applicable to other haplotype-based TDTs.  相似文献   

5.
Sample size re-estimation based on an observed difference can ensure an adequate power and potentially save a large amount of time and resources in clinical trials. One of the concerns for such an approach is that it may inflate the type I error. However, such a possible inflation has not been mathematically quantified. In this paper the mathematical mechanism of this inflation is explored for two-sample normal tests. A (conditional) type I error function based on normal data is derived. This function not only provides the quantification but also gives mathematical mechanisms of possible inflation in the type I error due to the sample size re-estimation. Theoretically, based on their decision rules (certain upper and lower bounds), people can calculate this function and exactly visualize the changes in type I error. Computer simulations are performed to ensure the results. If there are no bounds for the adjustment, the inflation is evident. If proper adjusting rules are used, the inflation can be well controlled. In some cases the type I error can even be reduced. The trade-off is to give up some 'unrealistic power'. We investigated several scenarios in which the mechanisms to change the type I error are different. Our simulations show that similar results may apply to other distributions.  相似文献   

6.
Consider a parallel group trial for the comparison of an experimental treatment to a control, where the second‐stage sample size may depend on the blinded primary endpoint data as well as on additional blinded data from a secondary endpoint. For the setting of normally distributed endpoints, we demonstrate that this may lead to an inflation of the type I error rate if the null hypothesis holds for the primary but not the secondary endpoint. We derive upper bounds for the inflation of the type I error rate, both for trials that employ random allocation and for those that use block randomization. We illustrate the worst‐case sample size reassessment rule in a case study. For both randomization strategies, the maximum type I error rate increases with the effect size in the secondary endpoint and the correlation between endpoints. The maximum inflation increases with smaller block sizes if information on the block size is used in the reassessment rule. Based on our findings, we do not question the well‐established use of blinded sample size reassessment methods with nuisance parameter estimates computed from the blinded interim data of the primary endpoint. However, we demonstrate that the type I error rate control of these methods relies on the application of specific, binding, pre‐planned and fully algorithmic sample size reassessment rules and does not extend to general or unplanned sample size adjustments based on blinded data. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

7.
A nonparametric test for equality of survival medians   总被引:1,自引:0,他引:1  
In clinical trials, researchers often encounter testing for equality of survival medians across study arms based on censored data. Even though Brookmeyer and Crowley introduced a method for comparing medians of several survival distributions, still some researchers misuse procedures that are designed for testing the homogeneity of survival curves. These procedures include the log-rank, Wilcoxon, and Cox models. This practice leads to inflation of the probability of a type I error, particularly when the underlying assumptions of these procedures are not met. We propose a new nonparametric method for testing the equality of several survival medians based on the Kaplan-Meier estimation from randomly right-censored data. We derive asymptotic properties of this test statistic. Through simulations, we compute and compare the empirical probabilities of type I errors and the power of this new procedure with those of the Brookmeyer-Crowley, log-rank, and Wilcoxon methods. Our simulation results indicate that the performance of these test procedures depends on the level of censoring and appropriateness of the underlying assumptions. When the objective is to test homogeneity of survival medians rather than survival curves and the assumptions of these tests are not met, some of these procedures severely inflate the probability of a type I error. In these situations, our test statistic provides an alternative to the Brookmeyer-Crowley test.  相似文献   

8.
In network meta‐analyses that synthesize direct and indirect comparison evidence concerning multiple treatments, multivariate random effects models have been routinely used for addressing between‐studies heterogeneities. Although their standard inference methods depend on large sample approximations (eg, restricted maximum likelihood estimation) for the number of trials synthesized, the numbers of trials are often moderate or small. In these situations, standard estimators cannot be expected to behave in accordance with asymptotic theory; in particular, confidence intervals cannot be assumed to exhibit their nominal coverage probabilities (also, the type I error probabilities of the corresponding tests cannot be retained). The invalidity issue may seriously influence the overall conclusions of network meta‐analyses. In this article, we develop several improved inference methods for network meta‐analyses to resolve these problems. We first introduce 2 efficient likelihood‐based inference methods, the likelihood ratio test–based and efficient score test–based methods, in a general framework of network meta‐analysis. Then, to improve the small‐sample inferences, we developed improved higher‐order asymptotic methods using Bartlett‐type corrections and bootstrap adjustment methods. The proposed methods adopt Monte Carlo approaches using parametric bootstraps to effectively circumvent complicated analytical calculations of case‐by‐case analyses and to permit flexible application to various statistical models network meta‐analyses. These methods can also be straightforwardly applied to multivariate meta‐regression analyses and to tests for the evaluation of inconsistency. In numerical evaluations via simulations, the proposed methods generally performed well compared with the ordinary restricted maximum likelihood–based inference method. Applications to 2 network meta‐analysis datasets are provided.  相似文献   

9.
BACKGROUND AND OBJECTIVE: Publication bias and other sample size effects are issues for meta-analyses of test accuracy, as for randomized trials. We investigate limitations of standard funnel plots and tests when applied to meta-analyses of test accuracy and look for improved methods. METHODS: Type I and type II error rates for existing and alternative tests of sample size effects were estimated and compared in simulated meta-analyses of test accuracy. RESULTS: Type I error rates for the Begg, Egger, and Macaskill tests are inflated for typical diagnostic odds ratios (DOR), when disease prevalence differs from 50% and when thresholds favor sensitivity over specificity or vice versa. Regression and correlation tests based on functions of effective sample size are valid, if occasionally conservative, tests for sample size effects. Empirical evidence suggests that they have adequate power to be useful tests. When DORs are heterogeneous, however, all tests of funnel plot asymmetry have low power. CONCLUSION: Existing tests that use standard errors of odds ratios are likely to be seriously misleading if applied to meta-analyses of test accuracy. The effective sample size funnel plot and associated regression test of asymmetry should be used to detect publication bias and other sample size related effects.  相似文献   

10.
The maternal-fetal genotype incompatibility (MFG) test can be used for a variety of genetic applications concerning disease risk in offspring including testing for the presence of alleles that act directly through offspring genotypes (child allelic effects), alleles that act through maternal genotypes (maternal allelic effects), or maternal-fetal genotype incompatibilities. The log-linear version of the MFG model divides the genotype data into many cells, where each cell represents one of the possible mother, father, and child genotype combinations. Currently, tests of hypotheses about different allelic effects are accomplished by an asymptotic MFG test, but it is unknown if this is appropriate under conditions that produce small cell counts. In this report, we develop an exact MFG test that is based on the permutation distribution of cell counts. We determine by simulation the type I error and power of both the exact MFG test and the asymptotic MFG test for four different biologically relevant scenarios: a test of child allelic effects in the presence of maternal allelic effects, a test of maternal allelic effects in the presence of child allelic effects, and tests of maternal-fetal genotype incompatibility with and without child allelic effects. These simulations show that, in general, the exact test is slightly conservative whereas the asymptotic test is slightly anti-conservative. However, the asymptotic MFG test produces significantly inflated type I error rates under conditions with extreme null allele frequencies and sample sizes of 75, 100, and 150. Under these conditions, the exact test is clearly preferred over the asymptotic test. Under all other conditions that we tested, the user can safely choose either the exact test or the asymptotic test.  相似文献   

11.
Zhou XH  Li C  Gao S  Tierney WM 《Statistics in medicine》2001,20(11):1703-1720
In this paper we propose five new tests for the equality of paired means of health care costs. The first two tests are the parametric tests, a Z-score test and a likelihood ratio test, both derived under the bivariate normality assumption for the log-transformed costs. The third test (Z-score with jack-knife) is a semi-parametric Z-score method, which only requires marginal log-normal assumptions. The last two tests are the non-parametric bootstrap tests: one is based on a t-test statistic, and the other is based on Johnson's modified t-test statistic. We conduct a simulation study to compare the performance of these tests, along with some commonly used tests when the sample size is small to moderate. The simulation results demonstrate that the commonly used paired t-test on the log-scale and the Wilcoxon signed rank for differences of the two original scales can yield type I error rates larger than the preset nominal levels. The commonly used paired t-test on the original data performs well with slightly skewed data, but can yield inaccurate results when two populations have different skewness. The likelihood ratio test, the parametric and semi-parametric Z-score tests all have very good type I error control with the likelihood ratio test being the best. However, the semi-parametric Z-score test requires less distributional assumptions than the two parametric tests. The percentile-t bootstrap test and bootstrapped Johnson's modified t-test have better type I error control than the paired t-test on the original-scale and Johnson's modified t-test, respectively. Combining with the propensity-score method, we can also apply the proposed methods to test the mean equality of two cost outcomes in the presence of confounders. Our two applications are from health services research. In the first one, we want to know the effect of Medicaid reimbursement policy change on outpatient health care costs. The second one is to evaluate the effect of a hospitalist programme on health care costs in an observational study, and the imbalanced covariates between intervention and control patients are taken into account using a propensity score approach.  相似文献   

12.
We consider weighted logrank tests for interval censored data when assessment times may depend on treatment, and for each individual, we only use the two assessment times that bracket the event of interest. It is known that treating finite right endpoints as observed events can substantially inflate the type I error rate under assessment–treatment dependence (ATD), but the validity of several other implementations of weighted logrank tests (score tests, permutation tests, multiple imputation tests) has not been studied in this situation. With a bounded number of unique assessment times, the score test under the grouped continuous model retains the type I error rate asymptotically under ATD; however, although the approximate permutation test based on the permutation central limit theorem is not asymptotically valid under every ATD scenario, we show through simulation that in many ATD scenarios, it retains the type I error rate better than the score test. We show a case where the approximate permutation test retains the type I error rate when the exact permutation test does not. We study and modify the multiple imputation logrank tests of Huang, Lee, and Yu (2008, Statistics in Medicine, 27: 3217–3226), showing that the distribution of the rank‐like scores asymptotically does not depend on the assessment times. We show through simulations that our modifications of the multiple imputation logrank tests retain the type I error rate in all cases studied, even with ATD and a small number of individuals in each treatment group. Simulations were performed using the interval R package. Published 2012. This article is a US Government work and is in the public domain in the USA.  相似文献   

13.
Genetic Analysis Workshop 17 (GAW17) focused on the transition from genome-wide association study designs and methods to the study designs and statistical genetic methods that will be required for the analysis of next-generation sequence data including both common and rare sequence variants. In the 166 contributions to GAW17, a wide variety of statistical methods were applied to simulated traits in population- and family-based samples, and results from these analyses were compared to the known generating model. In general, many of the statistical genetic methods used in the population-based sample identified causal sequence variants (SVs) when the estimated locus-specific heritability, as measured in the population-based sample, was greater than about 0.08. However, SVs with locus-specific heritabilities less than 0.03 were rarely identified consistently. In the family-based samples, many of the methods detected SVs that were rarer than those detected in the population-based sample, but the estimated locus-specific heritabilities for these rare SVs, as measured in the family-based samples, were substantially higher (>0.2) than their corresponding heritabilities in the population-based samples. Substantial inflation of the type I error rate was observed across a wide variety of statistical methods. Although many of the contributions found little inflation in type I error for Q4, a trait with no causal SVs, type I error rates for Q1 and Q2 were well above their nominal levels with the inflation for Q1 being higher than that for Q2. It seems likely that this inflation in type I error is due to correlations among SVs.  相似文献   

14.
A variety of prediction methods are used to relate high‐dimensional genome data with a clinical outcome using a prediction model. Once a prediction model is developed from a data set, it should be validated using a resampling method or an independent data set. Although the existing prediction methods have been intensively evaluated by many investigators, there has not been a comprehensive study investigating the performance of the validation methods, especially with a survival clinical outcome. Understanding the properties of the various validation methods can allow researchers to perform more powerful validations while controlling for type I error. In addition, sample size calculation strategy based on these validation methods is lacking. We conduct extensive simulations to examine the statistical properties of these validation strategies. In both simulations and a real data example, we have found that 10‐fold cross‐validation with permutation gave the best power while controlling type I error close to the nominal level. Based on this, we have also developed a sample size calculation method that will be used to design a validation study with a user‐chosen combination of prediction. Microarray and genome‐wide association studies data are used as illustrations. The power calculation method in this presentation can be used for the design of any biomedical studies involving high‐dimensional data and survival outcomes.  相似文献   

15.
On the non-inferiority of a diagnostic test based on paired observations   总被引:1,自引:0,他引:1  
Lu Y  Jin H  Genant HK 《Statistics in medicine》2003,22(19):3029-3044
Non-inferiority of a diagnostic test to the standard or the optimum test is a common issue in medical research. Often we want to determine if a new diagnostic test is as good as the standard reference test. Sometimes we are interested in an inexpensive test that may have an acceptably inferior sensitivity or specificity. While hypothesis testing procedures and sample size formulae for the equivalence of sensitivity or specificity alone have been proposed, very few studies have discussed simultaneous comparisons of both parameters. In this paper, we present three different testing procedures and sample size formulae for simultaneous comparison of sensitivity and specificity based on paired observations and with known disease status. These statistical procedures are then used to compare two classification rules that identify women for future osteoporotic fracture. Simulation experiments demonstrate that the new tests and sample size formulae give the appropriate type I and II error rates. Differences between our approach and the approach of Lui and Cumberland are discussed.  相似文献   

16.
Linkage studies that aim to map susceptibility genes for complex diseases commonly test for excess allele sharing among affected relatives. Conventional methods based on identical-by-descent IBD allele sharing do not allow for possible differences among families, such as arise in the case of locus heterogeneity, and thus have reduced ability to detect linkage in the presence of such heterogeneity. We investigated two approaches to test for heterogeneity in allele sharing, using a family-level covariate that may be associated with different disease mechanisms leading to differences in allele sharing. Likelihood ratio tests for heterogeneity were formulated based on an extension of the linear and exponential likelihood models developed by Kong and Cox. Alternatively, we examined the asymptotic and permutation distributions of T-tests for differences between mean allele-sharing linkage scores from two covariate-defined family subgroups, assuming exchangeability. The size and power of heterogeneity tests were evaluated for S(all) and S(pairs) allele-sharing scoring functions using data sets of families with affected sibling and cousin pairs, generated under a model of locus heterogeneity. In certain simulation scenarios, the likelihood ratio test statistics did not follow the expected asymptotic distributions. The type I error estimates for the T-statistics conformed to nominal 5 and 1% levels in all scenarios considered, and corresponding power was comparable to that of the likelihood ratio tests. Application of these tests for heterogeneity detected significant differences in allele sharing between subgroups of families with inflammatory bowel disease.  相似文献   

17.
Various methods have been described for re-estimating the final sample size in a clinical trial based on an interim assessment of the treatment effect. Many re-weight the observations after re-sizing so as to control the pursuant inflation in the type I error probability alpha. Lan and Trost (Estimation of parameters and sample size re-estimation. Proceedings of the American Statistical Association Biopharmaceutical Section 1997; 48-51) proposed a simple procedure based on conditional power calculated under the current trend in the data (CPT). The study is terminated for futility if CPT < or = CL, continued unchanged if CPT > or = CU, or re-sized by a factor m to yield CPT = CU if CL < CPT < CU, where CL and CU are pre-specified probability levels. The overall level alpha can be preserved since the reduction due to stopping for futility can balance the inflation due to sample size re-estimation, thus permitting any form of final analysis with no re-weighting. Herein the statistical properties of this approach are described including an evaluation of the probabilities of stopping for futility or re-sizing, the distribution of the re-sizing factor m, and the unconditional type I and II error probabilities alpha and beta. Since futility stopping does not allow a type I error but commits a type II error, then as the probability of stopping for futility increases, alpha decreases and beta increases. An iterative procedure is described for choice of the critical test value and the futility stopping boundary so as to ensure that specified alpha and beta are obtained. However, inflation in beta is controlled by reducing the probability of futility stopping, that in turn dramatically increases the possible re-sizing factor m. The procedure is also generalized to limit the maximum sample size inflation factor, such as at m max = 4. However, doing so then allows for a non-trivial fraction of studies to be re-sized at this level that still have low conditional power. These properties also apply to other methods for sample size re-estimation with a provision for stopping for futility. Sample size re-estimation procedures should be used with caution and the impact on the overall type II error probability should be assessed.  相似文献   

18.
Our first purpose was to determine whether, in the context of a group-randomized trial (GRT) with Gaussian errors, permutation or mixed-model regression methods fare better in the presence of measurable confounding in terms of their Monte Carlo type I error rates and power. Our results indicate that given a proper randomization, the type I error rate is similar for both methods, whether unadjusted or adjusted, even in small studies. However, our results also show that should the investigator face the unfortunate circumstance in which modest confounding exists in the only realization available, the unadjusted analysis risks a type I error; in this regard, there was little to distinguish the two methods. Finally, our results show that power is similar for the two methods and, not surprisingly, better for the adjusted tests.Our second purpose was to examine the relative performance of permutation and mixed-model regression methods in the context of a GRT when the normality assumptions underlying the mixed model are violated. Published studies have examined the impact of violation of this assumption at the member level only. Our findings indicate that both methods perform well when the assumption is violated so long as the ICC is very small and the design is balanced at the group level. However, at ICC>or=0.01, the permutation test carries the nominal type I error rate while the model-based test is conservative and so less powerful. Binomial group- and member-level errors did not otherwise change the relative performance of the two methods with regard to confounding.  相似文献   

19.
Kang SH  Chen JJ 《Statistics in medicine》2000,19(16):2089-2100
This paper investigates an approximate unconditional test for non-inferiority between two independent binomial proportions. The P-value of the approximate unconditional test is evaluated using the maximum likelihood estimate of the nuisance parameter. In this paper, we clarify some differences in defining the rejection regions between the approximate unconditional and conventional conditional or unconditional exact test. We compare the approximate unconditional test with the asymptotic test and unconditional exact test by Chan (Statistics in Medicine, 17, 1403-1413, 1998) with respect to the type I error and power. In general, the type I errors and powers are in the decreasing order of the asymptotic, approximate unconditional and unconditional exact tests. In many cases, the type I errors are above the nominal level from the asymptotic test, and are below the nominal level from the unconditional exact test. In summary, when the non-inferiority test is formulated in terms of the difference between two proportions, the approximate unconditional test is the most desirable, because it is easier to implement and generally more powerful than the unconditional exact test and its size rarely exceeds the nominal size. However, when a test between two proportions is formulated in terms of the ratio of two proportions, such as a test of efficacy, more caution should be made in selecting a test procedure. The performance of the tests depends on the sample size and the range of plausible values of the nuisance parameter. Published in 2000 by John Wiley & Sons, Ltd.  相似文献   

20.
In many applications of linear mixed-effects models to longitudinal and multilevel data especially from medical studies, it is of interest to test for the need of random effects in the model. It is known that classical tests such as the likelihood ratio, Wald, and score tests are not suitable for testing random effects because they suffer from testing on the boundary of the parameter space. Instead, permutation and bootstrap tests as well as Bayesian tests, which do not rely on the asymptotic distributions, avoid issues with the boundary of the parameter space. In this paper, we first develop a permutation test based on the likelihood ratio test statistic, which can be easily used for testing multiple random effects and any subset of them in linear mixed-effects models. The proposed permutation test would be an extension to two existing permutation tests. We then aim to compare permutation tests and Bayesian tests for random effects to find out which test is more powerful under which situation. Nothing is known about this in the literature, although this is an important practical problem due to the usefulness of both methods in tackling the challenges with testing random effects. For this, we consider a Bayesian test developed using Bayes factors, where we also propose a new alternative computation for this Bayesian test to avoid some computational issue it encounters in testing multiple random effects. Extensive simulations and a real data analysis are used for evaluation of the proposed permutation test and its comparison with the Bayesian test. We find that both tests perform well, albeit the permutation test with the likelihood ratio statistic tends to provide a relatively higher power when testing multiple random effects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号