首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
Missing outcomes are a commonly occurring problem for cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Two approaches for analysing such trials are cluster‐level analysis and individual‐level analysis. In this study, we assessed the performance of unadjusted cluster‐level analysis, baseline covariate‐adjusted cluster‐level analysis, random effects logistic regression and generalised estimating equations when binary outcomes are missing under a baseline covariate‐dependent missingness mechanism. Missing outcomes were handled using complete records analysis and multilevel multiple imputation. We analytically show that cluster‐level analyses for estimating risk ratio using complete records are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same covariate effect in the outcome model. We performed a simulation study considering four different scenarios, depending on whether the missingness mechanisms are the same or different between the intervention groups and whether there is an interaction between intervention group and baseline covariate in the outcome model. On the basis of the simulation study and analytical results, we give guidance on the conditions under which each approach is valid. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

3.
ABSTRACT: BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate's effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.  相似文献   

4.
Although missing outcome data are an important problem in randomized trials and observational studies, methods to address this issue can be difficult to apply. Using simulated data, the authors compared 3 methods to handle missing outcome data: 1) complete case analysis; 2) single imputation; and 3) multiple imputation (all 3 with and without covariate adjustment). Simulated scenarios focused on continuous or dichotomous missing outcome data from randomized trials or observational studies. When outcomes were missing at random, single and multiple imputations yielded unbiased estimates after covariate adjustment. Estimates obtained by complete case analysis with covariate adjustment were unbiased as well, with coverage close to 95%. When outcome data were missing not at random, all methods gave biased estimates, but handling missing outcome data by means of 1 of the 3 methods reduced bias compared with a complete case analysis without covariate adjustment. Complete case analysis with covariate adjustment and multiple imputation yield similar estimates in the event of missing outcome data, as long as the same predictors of missingness are included. Hence, complete case analysis with covariate adjustment can and should be used as the analysis of choice more often. Multiple imputation, in addition, can accommodate the missing-not-at-random scenario more flexibly, making it especially suited for sensitivity analyses.  相似文献   

5.
This research is motivated by studying the progression of age‐related macular degeneration where both a covariate and the response variable are subject to censoring. We develop a general framework to handle regression with censored covariate where the response can be different types and the censoring can be random or subject to (constant) detection limits. Multiple imputation is a popular technique to handle missing data that requires compatibility between the imputation model and the substantive model to obtain valid estimates. With censored covariate, we propose a novel multiple imputation‐based approach, namely, the semiparametric two‐step importance sampling imputation (STISI) method, to impute the censored covariate. Specifically, STISI imputes the missing covariate from a semiparametric accelerated failure time model conditional on fully observed covariates (Step 1) with the acceptance probability derived from the substantive model (Step 2). The 2‐step procedure automatically ensures compatibility and takes full advantage of the relaxed semiparametric assumption in the imputation. Extensive simulations demonstrate that the STISI method yields valid estimates in all scenarios and outperforms some existing methods that are commonly used in practice. We apply STISI on data from the Age‐related Eye Disease Study, to investigate the association between the progression time of the less severe eye and that of the more severe eye. We also illustrate the method by analyzing the urine arsenic data for patients from National Health and Nutrition Examination Survey (2003‐2004) where the response is binary and 1 covariate is subject to detection limit.  相似文献   

6.
During drug development, a key step is the identification of relevant covariates predicting between-subject variations in drug response. The full random effects model (FREM) is one of the full-covariate approaches used to identify relevant covariates in nonlinear mixed effects models. Here we explore the ability of FREM to handle missing (both missing completely at random (MCAR) and missing at random (MAR)) covariate data and compare it to the full fixed-effects model (FFEM) approach, applied either with complete case analysis or mean imputation. A global health dataset (20 421 children) was used to develop a FREM describing the changes of height for age Z-score (HAZ) over time. Simulated datasets (n = 1000) were generated with variable rates of missing (MCAR) covariate data (0%-90%) and different proportions of missing (MAR) data condition on either observed covariates or predicted HAZ. The three methods were used to re-estimate model and compared in terms of bias and precision which showed that FREM had only minor increases in bias and minor loss of precision at increasing percentages of missing (MCAR) covariate data and performed similarly in the MAR scenarios. Conversely, the FFEM approaches either collapsed at 70% of missing (MCAR) covariate data (FFEM complete case analysis) or had large bias increases and loss of precision (FFEM with mean imputation). Our results suggest that FREM is an appropriate approach to covariate modeling for datasets with missing (MCAR and MAR) covariate data, such as in global health studies.  相似文献   

7.
Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

8.
In various medical related researches, excessive zeros, which make the standard Poisson regression model inadequate, often exist in count data. We proposed a covariate‐dependent random effect model to accommodate the excess zeros and the heterogeneity in the population simultaneously. This work is motivated by a data set from a survey on the dental health status of Hong Kong preschool children where the response variable is the number of decayed, missing, or filled teeth. The random effect has a sound biological interpretation as the overall oral health status or other personal qualities of an individual child that is unobserved and unable to be quantified easily. The overall measure of oral health status, responsible for accommodating the excessive zeros and also the heterogeneity among the children, is covariate dependent. This covariate‐dependent random effect model allows one to distinguish whether a potential covariate has an effect on the conceived overall oral health condition of the children, that is, the random effect, or has a direct effect on the magnitude of the counts, or both. We proposed a multiple imputation approach for estimation of the parameters. We discussed the choice of the imputation size. We evaluated the performance of the proposed estimation method through simulation studies, and we applied the model and method to the dental data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

10.

The availability of data in the healthcare domain provides great opportunities for the discovery of new or hidden patterns in medical data, which can eventually lead to improved clinical decision making. Predictive models play a crucial role in extracting this unknown information from data. However, medical data often contain missing values that can degrade the performance of predictive models. Autoencoder models have been widely used as non-linear functions for the imputation of missing data in fields such as computer vision, transportation, and finance. In this study, we assess the shortcomings of autoencoder models for data imputation and propose modified models to improve imputation performance. To evaluate, we compare the performance of the proposed model with five well-known imputation techniques on six medical datasets and five classification methods. Through extensive experiments, we demonstrate that the proposed non-linear imputation model outperforms the other models for all degrees of missing ratios and leads to the highest disease classification accuracy for all datasets.

  相似文献   

11.
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear. We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H0(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H0(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression. We compare the methods using simulation studies. We find that using logT biases covariate‐outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

12.
In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data.  相似文献   

13.
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults.  相似文献   

15.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

16.
In this paper, we consider fitting semiparametric additive hazards models for case‐cohort studies using a multiple imputation approach. In a case‐cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing‐at‐random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

17.
Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald‐type statistics and weighted likelihood‐ratio tests are proposed and evaluated in simulation studies. The Wald‐based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

18.
Propensity scores have been used widely as a bias reduction method to estimate the treatment effect in nonrandomized studies. Since many covariates are generally included in the model for estimating the propensity scores, the proportion of subjects with at least one missing covariate could be large. While many methods have been proposed for propensity score‐based estimation in the presence of missing covariates, little has been published comparing the performance of these methods. In this article we propose a novel method called multiple imputation missingness pattern (MIMP) and compare it with the naive estimator (ignoring propensity score) and three commonly used methods of handling missing covariates in propensity score‐based estimation (separate estimation of propensity scores within each pattern of missing data, multiple imputation and discarding missing data) under different mechanisms of missing data and degree of correlation among covariates. Simulation shows that all adjusted estimators are much less biased than the naive estimator. Under certain conditions MIMP provides benefits (smaller bias and mean‐squared error) compared with existing alternatives. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.
We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects ‘cured’, and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification. We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号