首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
H Wu  L Wu 《Statistics in medicine》2001,20(12):1755-1769
We propose a three-step multiple imputation method, implemented by Gibbs sampler, for estimating parameters in non-linear mixed-effects models with missing covariates. Estimates obtained by the proposed multiple imputation method are compared to those obtained by the mean-value imputation method and the complete-case method through simulations. We find that the proposed multiple imputation method offers smaller biases and smaller mean-squared errors for the estimates of covariate coefficients compared to other two methods. We apply the three missing data methods to modelling HIV viral dynamics from an AIDS clinical trial. We believe that the results from the proposed multiple imputation method are more reliable than that from the other two commonly used methods.  相似文献   

2.
This research is motivated by studying the progression of age‐related macular degeneration where both a covariate and the response variable are subject to censoring. We develop a general framework to handle regression with censored covariate where the response can be different types and the censoring can be random or subject to (constant) detection limits. Multiple imputation is a popular technique to handle missing data that requires compatibility between the imputation model and the substantive model to obtain valid estimates. With censored covariate, we propose a novel multiple imputation‐based approach, namely, the semiparametric two‐step importance sampling imputation (STISI) method, to impute the censored covariate. Specifically, STISI imputes the missing covariate from a semiparametric accelerated failure time model conditional on fully observed covariates (Step 1) with the acceptance probability derived from the substantive model (Step 2). The 2‐step procedure automatically ensures compatibility and takes full advantage of the relaxed semiparametric assumption in the imputation. Extensive simulations demonstrate that the STISI method yields valid estimates in all scenarios and outperforms some existing methods that are commonly used in practice. We apply STISI on data from the Age‐related Eye Disease Study, to investigate the association between the progression time of the less severe eye and that of the more severe eye. We also illustrate the method by analyzing the urine arsenic data for patients from National Health and Nutrition Examination Survey (2003‐2004) where the response is binary and 1 covariate is subject to detection limit.  相似文献   

3.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

4.
Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults.  相似文献   

5.
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

6.
A popular method for analysing repeated‐measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse‐probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of being observed, according to some assumed missingness model. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model. WGEE are consistent when the data are MAR and the dropout model is correctly specified. Imputation methods are consistent when the data are MAR and the imputation model is correctly specified. Recently, doubly robust (DR) methods have been developed. These involve both a model for probability of missingness and an imputation model for the expectation of each missing observation, and are consistent when either is correct. We describe DR GEE, and illustrate their use on simulated data. We also analyse the INITIO randomized clinical trial of HIV therapy allowing for MAR dropout. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
ABSTRACT: BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate's effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.  相似文献   

8.
ObjectivesRegardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument.Study Design and SettingReal-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters.ResultsMean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score.ConclusionWe recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.  相似文献   

9.
Propensity scores have been used widely as a bias reduction method to estimate the treatment effect in nonrandomized studies. Since many covariates are generally included in the model for estimating the propensity scores, the proportion of subjects with at least one missing covariate could be large. While many methods have been proposed for propensity score‐based estimation in the presence of missing covariates, little has been published comparing the performance of these methods. In this article we propose a novel method called multiple imputation missingness pattern (MIMP) and compare it with the naive estimator (ignoring propensity score) and three commonly used methods of handling missing covariates in propensity score‐based estimation (separate estimation of propensity scores within each pattern of missing data, multiple imputation and discarding missing data) under different mechanisms of missing data and degree of correlation among covariates. Simulation shows that all adjusted estimators are much less biased than the naive estimator. Under certain conditions MIMP provides benefits (smaller bias and mean‐squared error) compared with existing alternatives. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
Missing outcomes are a commonly occurring problem for cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Two approaches for analysing such trials are cluster‐level analysis and individual‐level analysis. In this study, we assessed the performance of unadjusted cluster‐level analysis, baseline covariate‐adjusted cluster‐level analysis, random effects logistic regression and generalised estimating equations when binary outcomes are missing under a baseline covariate‐dependent missingness mechanism. Missing outcomes were handled using complete records analysis and multilevel multiple imputation. We analytically show that cluster‐level analyses for estimating risk ratio using complete records are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same covariate effect in the outcome model. We performed a simulation study considering four different scenarios, depending on whether the missingness mechanisms are the same or different between the intervention groups and whether there is an interaction between intervention group and baseline covariate in the outcome model. On the basis of the simulation study and analytical results, we give guidance on the conditions under which each approach is valid. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

12.
In various medical related researches, excessive zeros, which make the standard Poisson regression model inadequate, often exist in count data. We proposed a covariate‐dependent random effect model to accommodate the excess zeros and the heterogeneity in the population simultaneously. This work is motivated by a data set from a survey on the dental health status of Hong Kong preschool children where the response variable is the number of decayed, missing, or filled teeth. The random effect has a sound biological interpretation as the overall oral health status or other personal qualities of an individual child that is unobserved and unable to be quantified easily. The overall measure of oral health status, responsible for accommodating the excessive zeros and also the heterogeneity among the children, is covariate dependent. This covariate‐dependent random effect model allows one to distinguish whether a potential covariate has an effect on the conceived overall oral health condition of the children, that is, the random effect, or has a direct effect on the magnitude of the counts, or both. We proposed a multiple imputation approach for estimation of the parameters. We discussed the choice of the imputation size. We evaluated the performance of the proposed estimation method through simulation studies, and we applied the model and method to the dental data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

13.
Although missing outcome data are an important problem in randomized trials and observational studies, methods to address this issue can be difficult to apply. Using simulated data, the authors compared 3 methods to handle missing outcome data: 1) complete case analysis; 2) single imputation; and 3) multiple imputation (all 3 with and without covariate adjustment). Simulated scenarios focused on continuous or dichotomous missing outcome data from randomized trials or observational studies. When outcomes were missing at random, single and multiple imputations yielded unbiased estimates after covariate adjustment. Estimates obtained by complete case analysis with covariate adjustment were unbiased as well, with coverage close to 95%. When outcome data were missing not at random, all methods gave biased estimates, but handling missing outcome data by means of 1 of the 3 methods reduced bias compared with a complete case analysis without covariate adjustment. Complete case analysis with covariate adjustment and multiple imputation yield similar estimates in the event of missing outcome data, as long as the same predictors of missingness are included. Hence, complete case analysis with covariate adjustment can and should be used as the analysis of choice more often. Multiple imputation, in addition, can accommodate the missing-not-at-random scenario more flexibly, making it especially suited for sensitivity analyses.  相似文献   

14.
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log‐transformed numerator and denominator, and then calculate the ratio of interest; we call this ‘passive’ imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this ‘active’ imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high‐density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log‐transformation is preferable. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

15.
Hot-deck imputation is an intuitively simple and popular method of accommodating incomplete data. Users of the method will often use the usual multiple imputation variance estimator which is not appropriate in this case. However, no variance expression has yet been derived for this easily implemented method applied to missing covariates in regression models. The simple hot-deck method is in fact asymptotically equivalent to the mean-score method for the estimation of a regression model parameter, so that hot-deck can be understood in the context of likelihood methods. Both of these methods accommodate data where missingness may depend on the observed variables but not on the unobserved value of the incomplete covariate, that is, missing at random (MAR). The asymptotic properties of hot-deck are derived here for the case where the fully observed variables are categorical, though the incomplete covariate(s) may be continuous. Simulation studies indicate that the two methods compare well in small samples and for small numbers of imputations. Current users of hot-deck may now conduct their analysis using mean-score, which is a weighted likelihood method and can thus be implemented by a single pass through the data using any standard package which accommodates weighted regression models. Valid inference is now straightforward using the variance expression provided here. The equivalence of mean-score and hot-deck is illustrated using three clinical data sets where an important covariate is missing for a large number of study subjects. © 1997 by John Wiley & Sons, Ltd.  相似文献   

16.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

17.
ObjectiveWe compared popular methods to handle missing data with multiple imputation (a more sophisticated method that preserves data).Study Design and SettingWe used data of 804 patients with a suspicion of deep venous thrombosis (DVT). We studied three covariates to predict the presence of DVT: d-dimer level, difference in calf circumference, and history of leg trauma. We introduced missing values (missing at random) ranging from 10% to 90%. The risk of DVT was modeled with logistic regression for the three methods, that is, complete case analysis, exclusion of d-dimer level from the model, and multiple imputation.ResultsMultiple imputation showed less bias in the regression coefficients of the three variables and more accurate coverage of the corresponding 90% confidence intervals than complete case analysis and dropping d-dimer level from the analysis. Multiple imputation showed unbiased estimates of the area under the receiver operating characteristic curve (0.88) compared with complete case analysis (0.77) and when the variable with missing values was dropped (0.65).ConclusionAs this study shows that simple methods to deal with missing data can lead to seriously misleading results, we advise to consider multiple imputation. The purpose of multiple imputation is not to create data, but to prevent the exclusion of observed data.  相似文献   

18.
In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data.  相似文献   

19.
We propose a transition model for analysing data from complex longitudinal studies. Because missing values are practically unavoidable in large longitudinal studies, we also present a two-stage imputation method for handling general patterns of missing values on both the outcome and the covariates by combining multiple imputation with stochastic regression imputation. Our model is a time-varying auto-regression on the past innovations (residuals), and it can be used in cases where general dynamics must be taken into account, and where the model selection is important. The entire estimation process was carried out using available procedures in statistical packages such as SAS and S-PLUS. To illustrate the viability of the proposed model and the two-stage imputation method, we analyse data collected in an epidemiological study that focused on various factors relating to childhood growth. Finally, we present a simulation study to investigate the behaviour of our two-stage imputation procedure.  相似文献   

20.
Propensity score models are frequently used to estimate causal effects in observational studies. One unresolved issue in fitting these models is handling missing values in the propensity score model covariates. As these models usually contain a large set of covariates, using only individuals with complete data significantly decreases the sample size and statistical power. Several missing data imputation approaches have been proposed, including multiple imputation (MI), MI with missingness pattern (MIMP), and treatment mean imputation. Generalized boosted modeling (GBM), which is a nonparametric approach to estimate propensity scores, can automatically handle missingness in the covariates. Although the performance of MI, MIMP, and treatment mean imputation have previously been compared for binary treatments, they have not been compared for continuous exposures or with single imputation and GBM. We compared these approaches in estimating the generalized propensity score (GPS) for a continuous exposure in both a simulation study and in empirical data. Using GBM with the incomplete data to estimate the GPS did not perform well in the simulation. Missing values should be imputed before estimating propensity scores using GBM or any other approach for estimating the GPS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号