首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
Although missing outcome data are an important problem in randomized trials and observational studies, methods to address this issue can be difficult to apply. Using simulated data, the authors compared 3 methods to handle missing outcome data: 1) complete case analysis; 2) single imputation; and 3) multiple imputation (all 3 with and without covariate adjustment). Simulated scenarios focused on continuous or dichotomous missing outcome data from randomized trials or observational studies. When outcomes were missing at random, single and multiple imputations yielded unbiased estimates after covariate adjustment. Estimates obtained by complete case analysis with covariate adjustment were unbiased as well, with coverage close to 95%. When outcome data were missing not at random, all methods gave biased estimates, but handling missing outcome data by means of 1 of the 3 methods reduced bias compared with a complete case analysis without covariate adjustment. Complete case analysis with covariate adjustment and multiple imputation yield similar estimates in the event of missing outcome data, as long as the same predictors of missingness are included. Hence, complete case analysis with covariate adjustment can and should be used as the analysis of choice more often. Multiple imputation, in addition, can accommodate the missing-not-at-random scenario more flexibly, making it especially suited for sensitivity analyses.  相似文献   

2.
BACKGROUND: Most systematic reviewers aim to perform an intention-to-treat meta-analysis, including all randomized participants from each trial. This is not straightforward in practice: reviewers must decide how to handle missing outcome data in the contributing trials. OBJECTIVE: To investigate methods of allowing for uncertainty due to missing data in a meta-analysis. STUDY DESIGN AND SETTING: The Cochrane Library was surveyed to assess current use of imputation methods. We developed a methodology for incorporating uncertainty, with weights assigned to trials based on uncertainty interval widths. The uncertainty interval for a trial incorporates both sampling error and the potential impact of missing data. We evaluated the performance of this method using simulated data. RESULTS: The survey showed that complete-case analysis is commonly considered alongside best-worst case analysis. Best-worst case analysis gives an interval for the treatment effect that includes all of the uncertainty due to missing data. Unless there are few missing data, this interval is very wide. Simulations show that the uncertainty method consistently has better power and narrower interval widths than best-worst case analysis. CONCLUSION: The uncertainty method performs consistently better than best-worst case imputation and should be considered along with complete-case analysis whenever missing data are a concern.  相似文献   

3.
This paper presents a practical approach to analyzing incomplete quality of life (QOL) data that contains non-ignorable dropouts in patients with advanced non-small-cell lung cancer (NSCLC). QOL scores for the physical domain at baseline and at the end of the first and second courses of chemotherapy were compared between two treatment groups in a phase III trial. One hundred and 103 eligible patients were randomized to receive cisplatin and irinotecan (CPT-P) or cisplatin and vindesine, respectively; of those two groups, 83 and 85, respectively, completed a QOL questionnaire at least at baseline. A multiple imputation incorporating auxiliary QOL variables was implemented as one of alternatives of sensitivity analyses; these were complete case, available case, and pattern mixture analyses. Although larger sensitivity to missing data was found for CPT-P treatment, none of the alternative analyses demonstrated a significant difference in estimated slopes over time between the groups. This study presents an analytical approach for dealing with the complex problem of missing QOL data. It must be noted, however, that the validity of the multiple imputation method we present is not certain unless we can specify sufficiently informative auxiliary variables to ensure the conversion of non-ignorable missingness to ignorable.  相似文献   

4.
Multiple imputation can be a good solution to handling missing data if data are missing at random. However, this assumption is often difficult to verify. We describe an application of multiple imputation that makes this assumption plausible. This procedure requires contacting a random sample of subjects with incomplete data to fill in the missing information, and then adjusting the imputation model to incorporate the new data. Simulations with missing data that were decidedly not missing at random showed, as expected, that the method restored the original beta coefficients, whereas other methods of dealing with missing data failed. Using a dataset with real missing data, we found that different approaches to imputation produced moderately different results. Simulations suggest that filling in 10% of data that was initially missing is sufficient for imputation in many epidemiologic applications, and should produce approximately unbiased results, provided there is a high response on follow-up from the subsample of those with some originally missing data. This response can probably be achieved if this data collection is planned as an initial approach to dealing with the missing data, rather than at later stages, after further attempts that leave only data that is very difficult to complete.  相似文献   

5.
Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset.  相似文献   

6.
Hollis S 《Statistics in medicine》2002,21(24):3823-3834
Many clinical trials are analysed using an intention-to-treat (ITT) approach. A full application of the ITT approach is only possible when complete outcome data are available for all randomized subjects. In a recent survey of clinical trial reports including an ITT analysis, complete case analysis (excluding all patients with a missing response) was common. This does not comply with the basic principles of ITT since not all randomized subjects are included in the analysis. Analyses of data with missing values are based on untestable assumptions, and so sensitivity analysis presenting a range of estimates under alternative assumptions about the missing-data mechanism is recommended. For binary outcome, extreme case analysis has been suggested as a simple form of sensitivity analysis, but this is rarely conclusive. A graphical sensitivity analysis is proposed which displays the results of all possible allocations of cases with missing binary outcome. Extension to allow binomial variation in outcome is also considered. The display is based on easily interpretable parameters and allows informal examination of the effects of varying prior beliefs.  相似文献   

7.
ABSTRACT: BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate's effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.  相似文献   

8.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model.  相似文献   

10.
In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data.  相似文献   

11.
The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables, which formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post‐imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
OBJECTIVES: It is possible for baseline imbalances to occur between treatment groups for one or more variables in a randomized controlled trial, although the identification and detection of baseline imbalances remain controversial. If trials with baseline imbalances are combined in a meta-analysis, then this may result in misleading conclusions. STUDY DESIGN AND SETTING: The identification and consequences of baseline imbalances in meta-analyses are discussed. Metaregression using mean baseline scores as a covariate is proposed as a potential method for adjusting baseline imbalances within meta-analysis. We will use a recent systematic review looking at the effect of calcium supplements on weight as an illustrative case study. RESULTS: Meta-analysis conducted using the mean final values of the treatment groups as the outcome resulted in an apparent, statistically significant, treatment effect. However, using a meta-analysis of baseline values, this was shown to be due to the baseline imbalance between treatment groups, rather than as a result of any intervention received by the participants. Applying the method of metaregression demonstrated that there was in fact a smaller, statistically insignificant effect between treatment groups. CONCLUSION: The meta-analyst should always consider the possibility of baseline imbalances and adjustments should be made wherever possible.  相似文献   

13.
Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing cost-effectiveness data in a randomized controlled trial. Three incomplete data sets were generated from a complete reference data set with 17, 35 and 50 % missing data in effects and costs. The strategies evaluated included complete case analysis (CCA), multiple imputation with predictive mean matching (MI-PMM), MI-PMM on log-transformed costs (log MI-PMM), and a two-step MI. Mean cost and effect estimates, standard errors and incremental net benefits were compared with the results of the analyses on the complete reference data set. The CCA, MI-PMM, and the two-step MI strategy diverged from the results for the reference data set when the amount of missing data increased. In contrast, the estimates of the Log MI-PMM strategy remained stable irrespective of the amount of missing data. MI provided better estimates than CCA in all scenarios. With low amounts of missing data the MI strategies appeared equivalent but we recommend using the log MI-PMM with missing data greater than 35 %.  相似文献   

14.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

15.
Multiple imputation of baseline data in the cardiovascular health study   总被引:3,自引:0,他引:3  
Most epidemiologic studies will encounter missing covariate data. Software packages typically used for analyzing data delete any cases with a missing covariate to perform a complete case analysis. The deletion of cases complicates variable selection when different variables are missing on different cases, reduces power, and creates the potential for bias in the resulting estimates. Recently, software has become available for producing multiple imputations of missing data that account for the between-imputation variability. The implementation of the software to impute missing baseline data in the setting of the Cardiovascular Health Study, a large, observational study, is described. Results of exploratory analyses using the imputed data were largely consistent with results using only complete cases, even in a situation where one third of the cases were excluded from the complete case analysis. There were few differences in the exploratory results across three imputations, and the combined results from the multiple imputations were very similar to results from a single imputation. An increase in power was evident and variable selection simplified when using the imputed data sets.  相似文献   

16.
Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
The intention-to-treat (ITT) approach to randomized controlled trials analyzes data on the basis of treatment assignment, not treatment receipt. Alternative approaches make comparisons according to the treatment received at the end of the trial (as-treated analysis) or using only subjects who did not deviate from the assigned treatment (adherers-only analysis). Using a sensitivity analysis on data for a hypothetical trial, we compare these different analytical approaches in the context of two common protocol deviations: loss to follow-up and switching across treatments. In each case, two rates of deviation are considered: 10% and 30%. The analysis shows that biased estimates of effect may occur when deviation is nonrandom, when a large percentage of participants switch treatments or are lost to follow-up, and when the method of estimating missing values accounts inadequately for the process causing loss to follow-up. In general, ITT analysis attenuates between-group effects. Trialists should use sensitivity analyses on their data and should compare the characteristics of participants who do and those who do not deviate from the trial protocol. The ITT approach is not a remedy for unsound design, and imputation of missing values is not a substitute for complete, good quality data.  相似文献   

18.
BACKGROUND: Longitudinal studies almost always have some individuals with missing outcomes. Inappropriate handling of the missing data in the analysis can result in misleading conclusions. Here we review a wide range of methods to handle missing outcomes in single and repeated measures data and discuss which methods are most appropriate. METHODS: Using data from a randomized controlled trial to compare two interventions for increasing physical activity, we compare complete-case analysis; ad hoc imputation techniques such as last observation carried forward and worst-case; model-based imputation; longitudinal models with random effects; and recently proposed joint models for repeated measures data and non-ignorable dropout. RESULTS: Estimated intervention effects from ad hoc imputation methods vary widely. Standard multiple imputation and longitudinal modelling agree closely, as they should. Modifying the modelling method to allow for non-ignorable dropout had little effect on estimated intervention effects, but imputing using a common imputation model in both groups gave more conservative results. CONCLUSIONS: Results from ad hoc imputation methods should be avoided in favour of methods with more plausible assumptions although they may be computationally more complex. Although standard multiple imputation methods and longitudinal modelling methods are equivalent for estimating the treatment effect, the two approaches suggest different ways of relaxing the assumptions, and the choice between them depends on contextual knowledge.  相似文献   

19.
We propose a non-parametric multiple imputation scheme, NPMLE imputation, for the analysis of interval censored survival data. Features of the method are that it converts interval-censored data problems to complete data or right censored data problems to which many standard approaches can be used, and that measures of uncertainty are easily obtained. In addition to the event time of primary interest, there are frequently other auxiliary variables that are associated with the event time. For the goal of estimating the marginal survival distribution, these auxiliary variables may provide some additional information about the event time for the interval censored observations. We extend the imputation methods to incorporate information from auxiliary variables with potentially complex structures. To conduct the imputation, we use a working failure-time proportional hazards model to define an imputing risk set for each censored observation. The imputation schemes consist of using the data in the imputing risk sets to create an exact event time for each interval censored observation. In simulation studies we show that the use of multiple imputation methods can improve the efficiency of estimators and reduce the effect of missing visits when compared to simpler approaches. We apply the approach to cytomegalovirus shedding data from an AIDS clinical trial, in which CD4 count is the auxiliary variable.  相似文献   

20.
We consider a study‐level meta‐analysis with a normally distributed outcome variable and possibly unequal study‐level variances, where the object of inference is the difference in means between a treatment and control group. A common complication in such an analysis is missing sample variances for some studies. A frequently used approach is to impute the weighted (by sample size) mean of the observed variances (mean imputation). Another approach is to include only those studies with variances reported (complete case analysis). Both mean imputation and complete case analysis are only valid under the missing‐completely‐at‐random assumption, and even then the inverse variance weights produced are not necessarily optimal. We propose a multiple imputation method employing gamma meta‐regression to impute the missing sample variances. Our method takes advantage of study‐level covariates that may be used to provide information about the missing data. Through simulation studies, we show that multiple imputation, when the imputation model is correctly specified, is superior to competing methods in terms of confidence interval coverage probability and type I error probability when testing a specified group difference. Finally, we describe a similar approach to handling missing variances in cross‐over studies. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号