首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Longitudinal studies of cognitive performance are sensitive to dropout, as participants experiencing cognitive deficits are less likely to attend study visits, which may bias estimated associations between exposures of interest and cognitive decline. Multiple imputation is a powerful tool for handling missing data, however its use for missing cognitive outcome measures in longitudinal analyses remains limited. We use multiple imputation by chained equations (MICE) to impute cognitive performance scores of participants who did not attend the 2011–2013 exam of the Atherosclerosis Risk in Communities Study. We examined the validity of imputed scores using observed and simulated data under varying assumptions. We examined differences in the estimated association between diabetes at baseline and 20-year cognitive decline with and without imputed values. Lastly, we discuss how different analytic methods (mixed models and models fit using generalized estimate equations) and choice of for whom to impute result in different estimands. Validation using observed data showed MICE produced unbiased imputations. Simulations showed a substantial reduction in the bias of the 20-year association between diabetes and cognitive decline comparing MICE (3–4 % bias) to analyses of available data only (16–23 % bias) in a construct where missingness was strongly informative but realistic. Associations between diabetes and 20-year cognitive decline were substantially stronger with MICE than in available-case analyses. Our study suggests when informative data are available for non-examined participants, MICE can be an effective tool for imputing cognitive performance and improving assessment of cognitive decline, though careful thought should be given to target imputation population and analytic model chosen, as they may yield different estimands.  相似文献   

2.
Multiple imputation is a popular method for addressing missing data, but its implementation is difficult when data have a multilevel structure and one or more variables are systematically missing. This systematic missing data pattern may commonly occur in meta‐analysis of individual participant data, where some variables are never observed in some studies, but are present in other hierarchical data settings. In these cases, valid imputation must account for both relationships between variables and correlation within studies. Proposed methods for multilevel imputation include specifying a full joint model and multiple imputation with chained equations (MICE). While MICE is attractive for its ease of implementation, there is little existing work describing conditions under which this is a valid alternative to specifying the full joint model. We present results showing that for multilevel normal models, MICE is rarely exactly equivalent to joint model imputation. Through a simulation study and an example using data from a traumatic brain injury study, we found that in spite of theoretical differences, MICE imputations often produce results similar to those obtained using the joint model. We also assess the influence of prior distributions in MICE imputation methods and find that when missingness is high, prior choices in MICE models tend to affect estimation of across‐study variability more than compatibility of conditional likelihoods. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

3.
Longitudinal cohort studies often collect both repeated measurements of longitudinal outcomes and times to clinical events whose occurrence precludes further longitudinal measurements. Although joint modeling of the clinical events and the longitudinal data can be used to provide valid statistical inference for target estimands in certain contexts, the application of joint models in medical literature is currently rather restricted because of the complexity of the joint models and the intensive computation involved. We propose a multiple imputation approach to jointly impute missing data of both the longitudinal and clinical event outcomes. With complete imputed datasets, analysts are then able to use simple and transparent statistical methods and standard statistical software to perform various analyses without dealing with the complications of missing data and joint modeling. We show that the proposed multiple imputation approach is flexible and easy to implement in practice. Numerical results are also provided to demonstrate its performance. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

4.
目的 简要介绍R 环境下MICE填补方法(Multivariate imputation by chained equations)的填补估算应用并评价其填补效果.方法以实际数据阐述填补估算流程,比较MICE与常见的缺失数据处理方法(删除法、均(众)数法、回归法)填补估算效果的差异.结果当数据缺失率为10%时,MICE与常见的缺失数据处理方法估算结果无明显差异,各填补方法的3种变量的回归系数估计的相对误差在10%左右.随着缺失率的增加(20%,40%),各方法回归系数估计的相对误差都增加,但MICE 3种变量的回归系数的相对误差稳定在10%~20%左右,MICE表现优于其他方法而且结果稳定,回归法次之,删除法和均(众)数法较差.当缺失率达50%时,3种类型的变量估算的误差已经较大,所有方法填补估算效果欠佳.结论 MICE较其他多重填补软件操作简便,与常见的缺失数据处理方法相比,可充分地利用缺失记录的信息,能较准确地反应调查的真实情况,值得在实际工作中推广应用.  相似文献   

5.
Background and ObjectivesAs a result of the development of sophisticated techniques, such as multiple imputation, the interest in handling missing data in longitudinal studies has increased enormously in past years. Within the field of longitudinal data analysis, there is a current debate on whether it is necessary to use multiple imputations before performing a mixed-model analysis to analyze the longitudinal data. In the current study this necessity is evaluated.Study Design and SettingThe results of mixed-model analyses with and without multiple imputation were compared with each other. Four data sets with missing values were created—one data set with missing completely at random, two data sets with missing at random, and one data set with missing not at random). In all data sets, the relationship between a continuous outcome variable and two different covariates were analyzed: a time-independent dichotomous covariate and a time-dependent continuous covariate.ResultsAlthough for all types of missing data, the results of the mixed-model analysis with or without multiple imputations were slightly different, they were not in favor of one of the two approaches. In addition, repeating the multiple imputations 100 times showed that the results of the mixed-model analysis with multiple imputation were quite unstable.ConclusionIt is not necessary to handle missing data using multiple imputations before performing a mixed-model analysis on longitudinal data.  相似文献   

6.
Missing covariates often occur in biomedical studies with survival outcomes. Multiple imputation via chained equations (MICE) is a semi‐parametric and flexible approach that imputes multivariate data by a series of conditional models, one for each incomplete variable. When applying MICE, practitioners tend to specify the conditional models in a simple fashion largely dictated by the software, which could lead to suboptimal results. Practical guidelines for specifying appropriate conditional models in MICE are lacking. Motivated by a study of time to hip fractures in the Women's Health Initiative Observational Study using accelerated failure time models, we propose and experiment with some rationales leading to appropriate MICE specifications. This strategy starts with specifying a joint model for the variables involved. We first derive the conditional distribution of each variable under the joint model, then approximate these conditional distributions to the extent which can be characterized by commonly used regression models. We propose to fit separate models to impute incomplete variables by the failure status, which is key to generating appropriate MICE specifications for survival outcomes. The proposed strategy can be conveniently implemented with all available imputation software that uses fully conditional specifications. Our simulation results show that some commonly used simple MICE specifications can produce suboptimal results, while those based on the proposed strategy appear to perform well and be robust toward model misspecifications. Hence, we warn against a mechanical use of MICE and suggest careful modeling of the conditional distributions of variables to ensure proper performance.  相似文献   

7.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree-based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well-established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree-based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree-based imputation in MICE.  相似文献   

9.
Missing data often occur in cross-sectional surveys and longitudinal and experimental studies. The purpose of this study was to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations, before and after imputation of the missing variable “yearly household income.” We reviewed data from 4,162 participants of Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory powers of the SRH prediction models were good (range, 69–72%) and compared to the prediction model obtained after no imputation of missing yearly income, all other imputation methods improved the prediction of SRH (P < 0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore, given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each $5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with SRH.  相似文献   

10.
The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables, which formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post‐imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
Health economics studies with missing data are increasingly using approaches such as multiple imputation that assume that the data are “missing at random.” This assumption is often questionable, as—even given the observed data—the probability that data are missing may reflect the true, unobserved outcomes, such as the patients' true health status. In these cases, methodological guidelines recommend sensitivity analyses to recognise data may be “missing not at random” (MNAR), and call for the development of practical, accessible approaches for exploring the robustness of conclusions to MNAR assumptions. Little attention has been paid to the problem that data may be MNAR in health economics in general and in cost‐effectiveness analyses (CEA) in particular. In this paper, we propose a Bayesian framework for CEA where outcome or cost data are missing. Our framework includes a practical, accessible approach to sensitivity analysis that allows the analyst to draw on expert opinion. We illustrate the framework in a CEA comparing an endovascular strategy with open repair for patients with ruptured abdominal aortic aneurysm, and provide software tools to implement this approach.  相似文献   

12.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

13.
A multiple imputation strategy for incomplete longitudinal data   总被引:3,自引:0,他引:3  
Longitudinal studies are commonly used to study processes of change. Because data are collected over time, missing data are pervasive in longitudinal studies, and complete ascertainment of all variables is rare. In this paper a new imputation strategy for completing longitudinal data sets is proposed. The proposed methodology makes use of shrinkage estimators for pooling information across geographic entities, and of model averaging for pooling predictions across different statistical models. Bayes factors are used to compute weights (probabilities) for a set of models considered to be reasonable for at least some of the units for which imputations must be produced, imputations are produced by draws from the predictive distributions of the missing data, and multiple imputations are used to better reflect selected sources of uncertainty in the imputation process. The imputation strategy is developed within the context of an application to completing incomplete longitudinal variables in the so-called Area Resource File. The proposed procedure is compared with several other imputation procedures in terms of inferences derived with the imputations, and the proposed methodology is demonstrated to provide valid estimates of model parameters when the completed data are analysed. Extensions to other missing data problems in longitudinal studies are straightforward so long as the missing data mechanism can be assumed to be ignorable.  相似文献   

14.
BACKGROUND: Longitudinal studies almost always have some individuals with missing outcomes. Inappropriate handling of the missing data in the analysis can result in misleading conclusions. Here we review a wide range of methods to handle missing outcomes in single and repeated measures data and discuss which methods are most appropriate. METHODS: Using data from a randomized controlled trial to compare two interventions for increasing physical activity, we compare complete-case analysis; ad hoc imputation techniques such as last observation carried forward and worst-case; model-based imputation; longitudinal models with random effects; and recently proposed joint models for repeated measures data and non-ignorable dropout. RESULTS: Estimated intervention effects from ad hoc imputation methods vary widely. Standard multiple imputation and longitudinal modelling agree closely, as they should. Modifying the modelling method to allow for non-ignorable dropout had little effect on estimated intervention effects, but imputing using a common imputation model in both groups gave more conservative results. CONCLUSIONS: Results from ad hoc imputation methods should be avoided in favour of methods with more plausible assumptions although they may be computationally more complex. Although standard multiple imputation methods and longitudinal modelling methods are equivalent for estimating the treatment effect, the two approaches suggest different ways of relaxing the assumptions, and the choice between them depends on contextual knowledge.  相似文献   

15.
BackgroundStatistical analysis of a data set with missing data is a frequent problem to deal with in epidemiology. Methods are available to manage incomplete observations, avoiding biased estimates and improving their precision, compared to more traditional methods, such as the analysis of the sub-sample of complete observations.MethodsOne of these approaches is multiple imputation, which consists in imputing successively several values for each missing data item. Several completed data sets having the same distribution characteristics as the observed data (variability and correlations) are thus generated. Standard analyses are done separately on each completed dataset then combined to obtain a global result. In this paper, we discuss the various assumptions made on the origin of missing data (at random or not), and we present in a pragmatic way the process of multiple imputation. A recent method, Multiple Imputation by Chained Equations (MICE), based on a Monte-Carlo Markov Chain algorithm under missing at random data (MAR) hypothesis, is described. An illustrative example of the MICE method is detailed for the analysis of the relation between a dichotomous variable and two covariates presenting MAR data with no particular structure, through multivariate logistic regression.ResultsCompared with the original dataset without missing data, the results show a substantial improvement of the regression coefficient estimates with the MICE method, relatively to those obtained on the dataset with complete observations.ConclusionThis method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software (Splus, Stata). It can be used for multiple imputation of missing data of several variables with no particular structure.  相似文献   

16.
Pattern‐mixture models provide a general and flexible framework for sensitivity analyses of nonignorable missing data. The placebo‐based pattern‐mixture model (Little and Yau, Biometrics 1996; 52 :1324–1333) treats missing data in a transparent and clinically interpretable manner and has been used as sensitivity analysis for monotone missing data in longitudinal studies. The standard multiple imputation approach (Rubin, Multiple Imputation for Nonresponse in Surveys, 1987) is often used to implement the placebo‐based pattern‐mixture model. We show that Rubin's variance estimate of the multiple imputation estimator of treatment effect can be overly conservative in this setting. As an alternative to multiple imputation, we derive an analytic expression of the treatment effect for the placebo‐based pattern‐mixture model and propose a posterior simulation or delta method for the inference about the treatment effect. Simulation studies demonstrate that the proposed methods provide consistent variance estimates and outperform the imputation methods in terms of power for the placebo‐based pattern‐mixture model. We illustrate the methods using data from a clinical study of major depressive disorders. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

17.
We propose a transition model for analysing data from complex longitudinal studies. Because missing values are practically unavoidable in large longitudinal studies, we also present a two-stage imputation method for handling general patterns of missing values on both the outcome and the covariates by combining multiple imputation with stochastic regression imputation. Our model is a time-varying auto-regression on the past innovations (residuals), and it can be used in cases where general dynamics must be taken into account, and where the model selection is important. The entire estimation process was carried out using available procedures in statistical packages such as SAS and S-PLUS. To illustrate the viability of the proposed model and the two-stage imputation method, we analyse data collected in an epidemiological study that focused on various factors relating to childhood growth. Finally, we present a simulation study to investigate the behaviour of our two-stage imputation procedure.  相似文献   

18.
In this paper, we consider fitting semiparametric additive hazards models for case‐cohort studies using a multiple imputation approach. In a case‐cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing‐at‐random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

19.
We consider a study‐level meta‐analysis with a normally distributed outcome variable and possibly unequal study‐level variances, where the object of inference is the difference in means between a treatment and control group. A common complication in such an analysis is missing sample variances for some studies. A frequently used approach is to impute the weighted (by sample size) mean of the observed variances (mean imputation). Another approach is to include only those studies with variances reported (complete case analysis). Both mean imputation and complete case analysis are only valid under the missing‐completely‐at‐random assumption, and even then the inverse variance weights produced are not necessarily optimal. We propose a multiple imputation method employing gamma meta‐regression to impute the missing sample variances. Our method takes advantage of study‐level covariates that may be used to provide information about the missing data. Through simulation studies, we show that multiple imputation, when the imputation model is correctly specified, is superior to competing methods in terms of confidence interval coverage probability and type I error probability when testing a specified group difference. Finally, we describe a similar approach to handling missing variances in cross‐over studies. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
Individual participant data meta‐analyses (IPD‐MA) are increasingly used for developing and validating multivariable (diagnostic or prognostic) risk prediction models. Unfortunately, some predictors or even outcomes may not have been measured in each study and are thus systematically missing in some individual studies of the IPD‐MA. As a consequence, it is no longer possible to evaluate between‐study heterogeneity and to estimate study‐specific predictor effects, or to include all individual studies, which severely hampers the development and validation of prediction models. Here, we describe a novel approach for imputing systematically missing data and adopt a generalized linear mixed model to allow for between‐study heterogeneity. This approach can be viewed as an extension of Resche‐Rigon's method (Stat Med 2013), relaxing their assumptions regarding variance components and allowing imputation of linear and nonlinear predictors. We illustrate our approach using a case study with IPD‐MA of 13 studies to develop and validate a diagnostic prediction model for the presence of deep venous thrombosis. We compare the results after applying four methods for dealing with systematically missing predictors in one or more individual studies: complete case analysis where studies with systematically missing predictors are removed, traditional multiple imputation ignoring heterogeneity across studies, stratified multiple imputation accounting for heterogeneity in predictor prevalence, and multilevel multiple imputation (MLMI) fully accounting for between‐study heterogeneity. We conclude that MLMI may substantially improve the estimation of between‐study heterogeneity parameters and allow for imputation of systematically missing predictors in IPD‐MA aimed at the development and validation of prediction models. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号