首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Propensity score methods are used to reduce the effects of observed confounding when using observational data to estimate the effects of treatments or exposures. A popular method of using the propensity score is inverse probability of treatment weighting (IPTW). When using this method, a weight is calculated for each subject that is equal to the inverse of the probability of receiving the treatment that was actually received. These weights are then incorporated into the analyses to minimize the effects of observed confounding. Previous research has found that these methods result in unbiased estimation when estimating the effect of treatment on survival outcomes. However, conventional methods of variance estimation were shown to result in biased estimates of standard error. In this study, we conducted an extensive set of Monte Carlo simulations to examine different methods of variance estimation when using a weighted Cox proportional hazards model to estimate the effect of treatment. We considered three variance estimation methods: (i) a naïve model‐based variance estimator; (ii) a robust sandwich‐type variance estimator; and (iii) a bootstrap variance estimator. We considered estimation of both the average treatment effect and the average treatment effect in the treated. We found that the use of a bootstrap estimator resulted in approximately correct estimates of standard errors and confidence intervals with the correct coverage rates. The other estimators resulted in biased estimates of standard errors and confidence intervals with incorrect coverage rates. Our simulations were informed by a case study examining the effect of statin prescribing on mortality. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

2.
Yan J  Fine J 《Statistics in medicine》2004,23(6):859-74; discussion 875-7,879-80
This paper investigates generalized estimating equations for association parameters, which are frequently of interest in family studies, with emphasis on covariance estimation. Separate link functions are used to connect the mean, the scale, and the correlation to linear predictors involving possibly different sets of covariates, and separate estimating equations are proposed for the three sets of parameters. Simulations show that the robust 'sandwich' variance estimator and the jackknife variance estimator for the correlation parameters are generally close to the empirical variance for the sample size of 50 clusters. The results contradict Ziegler et al. and Kastner and Ziegler, where the 'sandwich' estimator obtained from the software MAREG was shown to be unsuitable for practical usage. The problem appears to arise because the MAREG variance estimator does not account for variability in estimation of the scale parameters, but may be valid with fixed scale. We also find that the formula for the approximate jackknife variance estimator in Ziegler et al. is deficient, resulting in systematic deviations from the fully iterated jackknife variance estimator. A general jackknife formula is provided and performs well in numerical studies. Data from a study on the genetics of alcoholism is used to illustrate the importance of reliable variance estimation in biomedical applications.  相似文献   

3.
We examine the behaviour of the variance-covariance parameter estimates in an alternating binary Markov model with misclassification. Transition probabilities specify the state transitions for a process that is not directly observable. The state of an observable process, which may not correctly classify the state of the unobservable process, is obtained at discrete time points. Misclassification probabilities capture the two types of classification errors. Variance components of the estimated transition parameters are calculated with three estimation procedures: observed information, jackknife, and bootstrap techniques. Simulation studies are used to compare variance estimates and reveal the effect of misclassification on transition parameter estimation. The three approaches generally provide similar variance estimates for large samples and moderate misclassification. In these situations, the resampling methods are reasonable alternatives when programming partial derivatives is not appealing. With smaller chains or higher misclassification probabilities, the bootstrap method appears to be the best choice.  相似文献   

4.
The area under the ROC (receiver operating characteristic) curve, AUC, is one of the most commonly used measures to evaluate the performance of a binary classifier. Due to sampling variation, the model with the largest observed AUC score is not necessarily optimal, so it is crucial to assess the variation of AUC estimate. We extend the proposal by Wang and Lindsay and devise an unbiased variance estimator of AUC estimate that is of a two-sample U-statistic form. The proposal can be easily generalized to estimate the variance of a K-sample U-statistic (K ≥ 2 ). To make our developed variance estimator more applicable, we employ a partition-resampling scheme that is computationally efficient. Simulation studies suggest that the developed AUC variance estimator yields much better or comparable performance to jackknife and bootstrap variance estimators, and computational times that are about 10 to 30 times faster than the times of its counterparts. In practice, the proposal can be used in the one-standard-error rule for model selection, or to construct an asymptotic confidence interval of AUC in binary classification. In addition to conducting simulation studies, we illustrate its practical applications using two real datasets in medical sciences.  相似文献   

5.

Background

This work has investigated under what conditions confidence intervals around the differences in mean costs from a cluster RCT are suitable for estimation using a commonly used cluster-adjusted bootstrap in preference to methods that utilise the Huber-White robust estimator of variance. The bootstrap's main advantage is in dealing with skewed data, which often characterise patient costs. However, it is insufficiently well recognised that one method of adjusting the bootstrap to deal with clustered data is only valid in large samples. In particular, the requirement that the number of clusters randomised should be large would not be satisfied in many cluster RCTs performed to date.

Methods

The performances of confidence intervals for simple differences in mean costs utilising a robust (cluster-adjusted) standard error and from two cluster-adjusted bootstrap procedures were compared in terms of confidence interval coverage in a large number of simulations. Parameters varied included the intracluster correlation coefficient, the sample size and the distributions used to generate the data.

Results

The bootstrap's advantage in dealing with skewed data was found to be outweighed by its poor confidence interval coverage when the number of clusters was at the level frequently found in cluster RCTs in practice. Simulations showed that confidence intervals based on robust methods of standard error estimation achieved coverage rates between 93.5% and 94.8% for a 95% nominal level whereas those for the bootstrap ranged between 86.4% and 93.8%.

Conclusion

In general, 24 clusters per treatment arm is probably the minimum number for which one would even begin to consider the bootstrap in preference to traditional robust methods, for the parameter combinations investigated here. At least this number of clusters and extremely skewed data would be necessary for the bootstrap to be considered in favour of the robust method. There is a need for further investigation of more complex bootstrap procedures if economic data from cluster RCTs are to be analysed appropriately.  相似文献   

6.
Wang M  Long Q 《Statistics in medicine》2011,30(11):1278-1291
Generalized estimating equations (GEE (Biometrika 1986; 73(1):13-22) is a general statistical method to fit marginal models for correlated or clustered responses, and it uses a robust sandwich estimator to estimate the variance-covariance matrix of the regression coefficient estimates. While this sandwich estimator is robust to the misspecification of the correlation structure of the responses, its finite sample performance deteriorates as the number of clusters or observations per cluster decreases. To address this limitation, Pan (Biometrika 2001; 88(3):901-906) and Mancl and DeRouen (Biometrics 2001; 57(1):126-134) investigated two modifications to the original sandwich variance estimator. Motivated by the ideas underlying these two modifications, we propose a novel robust variance estimator that combines the strengths of these estimators. Our theoretical and numerical results show that the proposed estimator attains better efficiency and achieves better finite sample performance compared with existing estimators. In particular, when the sample size or cluster size is small, our proposed estimator exhibits lower bias and the resulting confidence intervals for GEE estimates achieve better coverage rates performance. We illustrate the proposed method using data from a dental study.  相似文献   

7.
Heteroscedasticity is commonly encountered when fitting nonlinear regression models in practice. We discuss eight different variance estimation methods for nonlinear regression models with heterogeneous response variances, and present a simulation study to compare the performance of the eight methods in terms of estimating the standard errors of the fitted model parameters. The simulation study suggests that when the true variance is a function of the mean model, the power of the mean variance function estimation method and the transform‐both‐sides method are the best choices for estimating the standard errors of the estimated model parameters. In general, the wild bootstrap estimator and two modified versions of the standard sandwich variance estimator are reasonably accurate with relatively small bias, especially when the heterogeneity is nonsystematic across values of the covariate. Furthermore, we note that the two modified sandwich estimators are appealing choices in practice, considering the computational advantage of these two estimation methods relative to the variance function estimation method and the transform‐both‐sides approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

8.
Publicly available national survey data are useful for the evidence-based research to advance our understanding of important questions in the health and biomedical sciences. Appropriate variance estimation is a crucial step to evaluate the strength of evidence in the data analysis. In survey data analysis, the conventional linearization method for estimating the variance of a statistic of interest uses the variance estimator of the total based on linearized variables. We warn that this common practice may result in undesirable consequences such as susceptibility to data shift and severely inflated variance estimates, when unequal weights are incorporated into variance estimation. We propose to use the variance estimator of the mean (mean-approach) instead of the variance estimator of the total (total-approach). We show a superiority of the mean-approach through analytical investigations. A real data example (the National Comorbidity Survey Replication) and simulation-based studies strongly support our conclusion.  相似文献   

9.
Shen H  Brown LD  Zhi H 《Statistics in medicine》2006,25(17):3023-3038
In this paper, the problem of interest is efficient estimation of log-normal means. Several existing estimators are reviewed first, including the sample mean, the maximum likelihood estimator, the uniformly minimum variance unbiased estimator and a conditional minimal mean squared error estimator. A new estimator is then proposed, and we show that it improves over the existing estimators in terms of squared error risk. The improvement is more significant with small sample sizes and large coefficient of variations, which is common in clinical pharmacokinetic (PK) studies. In addition, the new estimator is very easy to implement, and provides us with a simple alternative to summarize PK data, which are usually modelled by log-normal distributions. We also propose a parametric bootstrap confidence interval for log-normal means around the new estimator and illustrate its nice coverage property with a simulation study. Our estimator is compared with the existing ones via theoretical calculations and applications to real PK studies.  相似文献   

10.
We present a two‐step approach for estimating hazard rates and, consequently, survival probabilities, by levels of general categorical exposure. The resulting estimator utilizes three sources of data: vital statistics data and census data are used at the first step to estimate the overall hazard rate for a given combination of gender and age group, and cohort data constructed from a nationally representative complex survey with linked mortality records, are used at the second step to divide the overall hazard rate by exposure levels. We present an explicit expression for the resulting estimator and consider two methods for variance estimation that account for complex multistage sample design: (1) the leaving‐one‐out jackknife method, and (2) the Taylor linearization method, which provides an analytic formula for the variance estimator. The methods are illustrated with smoking and all‐cause mortality data from the US National Health Interview Survey Linked Mortality Files, and the proposed estimator is compared with a previously studied crude hazard rate estimator that uses survey data only. The advantages of a two‐step approach and possible extensions of the proposed estimator are discussed. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

11.
The least-squares estimator of the slope in a simple linear regression model is biased towards zero when the predictor is measured with random error. A corrected slope may be estimated by adding data from a reliability study, which comprises a subset of subjects from the main study. The precision of this corrected slope depends on the design of the reliability study and estimator choice.Previous work has assumed that the reliability study constitutes a random sample from the main study. A more efficient design is to use subjects with extreme values on their first measurement. Previously, we published a variance formula for the corrected slope, when the correction factor is the slope in the regression of the second measurement on the first. In this paper we show that both designs improve by maximum likelihood estimation (MLE). The precision gain is explained by the inclusion of data from all subjects for estimation of the predictor's variance and by the use of the second measurement for estimation of the covariance between response and predictor. The gain of MLE enhances with stronger true relationship between response and predictor and with lower precision in the predictor measurements. We present a real data example on the relationship between fasting insulin, a surrogate marker, and true insulin sensitivity measured by a gold-standard euglycaemic insulin clamp, and simulations, where the behavior of profile-likelihood-based confidence intervals is examined. MLE was shown to be a robust estimator for non-normal distributions and efficient for small sample situations. Copyright (c) 2008 John Wiley & Sons, Ltd.  相似文献   

12.
Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings.  相似文献   

13.
Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U‐Statistics for independent but non‐identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between‐within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

14.
Xing B  Ganju J 《Statistics in medicine》2005,24(12):1807-1814
Blinded estimation of variance allows for changing the sample size without compromising the integrity of the trial. Some of the methods that estimate the variance in a blinded manner either make untenable assumptions or are only applicable to two-treatment trials. We propose a new method for continuous endpoints that makes minimal assumptions. The method uses the enrollment order of subjects and the randomization block size to estimate the variance. It can be applied to normal or non-normal data, trials with two or more treatments, equal or unequal allocation schemes, fixed or random randomization block sizes, and single or multi-centre trials. The variance estimator is unbiased and performs best when the randomization block size is the smallest. Simulation results suggest that for many commonly used randomization block sizes the proposed estimator is expected to perform well. The proposed method is used to estimate the variance of the endpoint for two trials and is shown to perform well by comparison with its unblinded counterpart.  相似文献   

15.
Propensity score (PS) methods have been used extensively to adjust for confounding factors in the statistical analysis of observational data in comparative effectiveness research. There are four major PS‐based adjustment approaches: PS matching, PS stratification, covariate adjustment by PS, and PS‐based inverse probability weighting. Though covariate adjustment by PS is one of the most frequently used PS‐based methods in clinical research, the conventional variance estimation of the treatment effects estimate under covariate adjustment by PS is biased. As Stampf et al. have shown, this bias in variance estimation is likely to lead to invalid statistical inference and could result in erroneous public health conclusions (e.g., food and drug safety and adverse events surveillance). To address this issue, we propose a two‐stage analytic procedure to develop a valid variance estimator for the covariate adjustment by PS analysis strategy. We also carry out a simple empirical bootstrap resampling scheme. Both proposed procedures are implemented in an R function for public use. Extensive simulation results demonstrate the bias in the conventional variance estimator and show that both proposed variance estimators offer valid estimates for the true variance, and they are robust to complex confounding structures. The proposed methods are illustrated for a post‐surgery pain study. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
Generalized estimating equations (GEEs) are commonly used to estimate transition models. When the Markov assumption does not hold but first-order transition probabilities are still of interest, the transition inference is sensitive to the choice of working correlation. In this paper, we consider a random process transition model as the true underlying data generating mechanism, which characterizes subject heterogeneity and complex dependence structure of the outcome process in a very flexible way. We formally define two types of transition probabilities at the population level: “naive transition probabilities” that average across all the transitions and “population-average transition probabilities” that average the subject-specific transition probabilities. Through asymptotic bias calculations and finite-sample simulations, we demonstrate that the unstructured working correlation provides unbiased estimators of the population-average transition probabilities while the independence working correlation provides unbiased estimators of the naive transition probabilities. For population-average transition estimation, we demonstrate that the sandwich estimator fails for unstructured GEE and recommend the use of either jackknife or bootstrap variance estimates. The proposed method is motivated by and applied to the NEXT Generation Health Study, where the interest is in estimating the population-average transition probabilities of alcohol use in adolescents.  相似文献   

17.
In this paper, we propose a hybrid variance estimator for the Kaplan-Meier survival function. This new estimator approximates the true variance by a Binomial variance formula, where the proportion parameter is a piecewise non-increasing function of the Kaplan-Meier survival function and its upper bound, as described below. Also, the effective sample size equals the number of subjects not censored prior to that time. In addition, we consider an adjusted hybrid variance estimator that modifies the regular estimator for small sample sizes. We present a simulation study to compare the performance of the regular and adjusted hybrid variance estimators to the Greenwood and Peto variance estimators for small sample sizes. We show that on average these hybrid variance estimators give closer variance estimates to the true values than the traditional variance estimators, and hence confidence intervals constructed with these hybrid variance estimators have more nominal coverage rates. Indeed, the Greenwood and Peto variance estimators can substantially underestimate the true variance in the left and right tails of the survival distribution, even with moderately censored data. Finally, we illustrate the use of these hybrid and traditional variance estimators on a data set from a leukaemia clinical trial.  相似文献   

18.
When we synthesize research findings via meta‐analysis, it is common to assume that the true underlying effect differs across studies. Total variability consists of the within‐study and between‐study variances (heterogeneity). There have been established measures, such as I2, to quantify the proportion of the total variation attributed to heterogeneity. There is a plethora of estimation methods available for estimating heterogeneity. The widely used DerSimonian and Laird estimation method has been challenged, but knowledge of the overall performance of heterogeneity estimators is incomplete. We identified 20 heterogeneity estimators in the literature and evaluated their performance in terms of mean absolute estimation error, coverage probability, and length of the confidence interval for the summary effect via a simulation study. Although previous simulation studies have suggested the Paule‐Mandel estimator, it has not been compared with all the available estimators. For dichotomous outcomes, estimating heterogeneity through Markov chain Monte Carlo is a good choice if an informative prior distribution for heterogeneity is employed (eg, by published Cochrane reviews). Nonparametric bootstrap and positive DerSimonian and Laird perform well for all assessment criteria for both dichotomous and continuous outcomes. Hartung‐Makambi estimator can be the best choice when the heterogeneity values are close to 0.07 for dichotomous outcomes and medium heterogeneity values (0.01 , 0.05) for continuous outcomes. Hence, there are heterogeneity estimators (nonparametric bootstrap DerSimonian and Laird and positive DerSimonian and Laird) that perform better than the suggested Paule‐Mandel. Maximum likelihood provides the best performance for both types of outcome in the absence of heterogeneity.  相似文献   

19.
Propensity‐score matching is frequently used to estimate the effect of treatments, exposures, and interventions when using observational data. An important issue when using propensity‐score matching is how to estimate the standard error of the estimated treatment effect. Accurate variance estimation permits construction of confidence intervals that have the advertised coverage rates and tests of statistical significance that have the correct type I error rates. There is disagreement in the literature as to how standard errors should be estimated. The bootstrap is a commonly used resampling method that permits estimation of the sampling variability of estimated parameters. Bootstrap methods are rarely used in conjunction with propensity‐score matching. We propose two different bootstrap methods for use when using propensity‐score matching without replacementand examined their performance with a series of Monte Carlo simulations. The first method involved drawing bootstrap samples from the matched pairs in the propensity‐score‐matched sample. The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples. The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons, Ltd.  相似文献   

20.
The relative concentration index (RCI) and the absolute concentration index (ACI) have been widely used for monitoring health disparities with ranked health determinants. The RCI has been extended to allow value judgments about inequality aversion by Pereira in 1998 and by Wagstaff in 2002. Previous studies of the extended RCI have focused on survey sample data. This paper adapts the extended RCI for use with directly standardized rates (DSRs) calculated from population-based surveillance data. A Taylor series linearization (TL)–based variance estimator is developed and evaluated using simulations. A simulation-based Monte Carlo (MC) variance estimator is also evaluated as a comparison. Following Wagstaff's approach in 1991, we extend the ACI for use with DSRs. In all simulations, both the TL and MC methods produce valid variance estimates. The TL variance estimator has a simple, closed form that is attractive to users without sophisticated programming skills. The TL and MC estimators have been incorporated into a beta version of the National Cancer Institute's Health Disparities Calculator, a free statistical software tool that enables the estimation of 11 commonly used summary measures of health disparities for DSRs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号