首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given data set, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population by incorporating an appropriately calculated offset termed the “calibrated-δ adjustment.” We describe the derivation of this offset from the population distribution of the incomplete variable and show how, in applications, it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-δ adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general missing not at random missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared with standard MI. Calibrated-δ adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departures from the MAR assumption.  相似文献   

2.
Multiple imputation has become easier to perform with the advent of several software packages that provide imputations under a multivariate normal model, but imputation of missing binary data remains an important practical problem. Here, we explore three alternative methods for converting a multivariate normal imputed value into a binary imputed value: (1) simple rounding of the imputed value to the nearer of 0 or 1, (2) a Bernoulli draw based on a 'coin flip' where an imputed value between 0 and 1 is treated as the probability of drawing a 1, and (3) an adaptive rounding scheme where the cut-off value for determining whether to round to 0 or 1 is based on a normal approximation to the binomial distribution, making use of the marginal proportions of 0's and 1's on the variable. We perform simulation studies on a data set of 206,802 respondents to the California Healthy Kids Survey, where the fully observed data on 198,262 individuals defines the population, from which we repeatedly draw samples with missing data, impute, calculate statistics and confidence intervals, and compare bias and coverage against the true values. Frequently, we found satisfactory bias and coverage properties, suggesting that approaches such as these that are based on statistical approximations are preferable in applied research to either avoiding settings where missing data occur or relying on complete-case analyses. Considering both the occurrence and extent of deficits in coverage, we found that adaptive rounding provided the best performance.  相似文献   

3.
Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments.  相似文献   

4.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single‐level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost‐effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing‐at‐random clustered data scenarios were simulated following a full‐factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed‐effects multiple imputation and too low following single‐level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

6.
Conventional pattern-mixture models can be highly sensitive to model misspecification. In many longitudinal studies, where the nature of the drop-out and the form of the population model are unknown, interval estimates from any single pattern-mixture model may suffer from undercoverage, because uncertainty about model misspecification is not taken into account. In this article, a new class of Bayesian random coefficient pattern-mixture models is developed to address potentially non-ignorable drop-out. Instead of imposing hard equality constraints to overcome inherent inestimability problems in pattern-mixture models, we propose to smooth the polynomial coefficient estimates across patterns using a hierarchical Bayesian model that allows random variation across groups. Using real and simulated data, we show that multiple imputation under a three-level linear mixed-effects model which accommodates a random level due to drop-out groups can be an effective method to deal with non-ignorable drop-out by allowing model uncertainty to be incorporated into the imputation process.  相似文献   

7.
Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight–gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.  相似文献   

8.
Clinical trials allow researchers to draw conclusions about the effectiveness of a treatment. However, the statistical analysis used to draw these conclusions will inevitably be complicated by the common problem of attrition. Resorting to ad hoc methods such as case deletion or mean imputation can lead to biased results, especially if the amount of missing data is high. Multiple imputation, on the other hand, provides the researcher with an approximate solution that can be generalized to a number of different data sets and statistical problems. Multiple imputation is known to be statistically valid when n is large. However, questions still remain about the validity of multiple imputation for small samples in clinical trials. In this paper we investigate the small-sample performance of several multiple imputation methods, as well as the last observation carried forward method.  相似文献   

9.
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear. We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H0(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H0(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression. We compare the methods using simulation studies. We find that using logT biases covariate‐outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case–control study of 135 incident cases of myocardial infarction and 135 pair‐matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case–control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ? N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

11.
Harel O  Zhou XH 《Statistics in medicine》2007,26(11):2370-2388
Two-phase designs are common in epidemiological studies of dementia, and especially in Alzheimer research. In the first phase, all subjects are screened using a common screening test(s), while in the second phase, only a subset of these subjects is tested using a more definitive verification assessment, i.e. golden standard test. When comparing the accuracy of two screening tests in a two-phase study of dementia, inferences are commonly made using only the verified sample. It is well documented that in that case, there is a risk for bias, called verification bias. When the two screening tests have only two values (e.g. positive and negative) and we are trying to estimate the differences in sensitivities and specificities of the tests, one is actually estimating a confidence interval for differences of binomial proportions. Estimating this difference is not trivial even with complete data. It is well documented that it is a tricky task. In this paper, we suggest ways to apply imputation procedures in order to correct the verification bias. This procedure allows us to use well-established complete-data methods to deal with the difficulty of the estimation of the difference of two binomial proportions in addition to dealing with incomplete data. We compare different methods of estimation and evaluate the use of multiple imputation in this case. Our simulation results show that the use of multiple imputation is superior to other commonly used methods. We demonstrate our finding using Alzheimer data. Copyright (c) 2006 John Wiley & Sons, Ltd.  相似文献   

12.
This research is motivated by studying the progression of age‐related macular degeneration where both a covariate and the response variable are subject to censoring. We develop a general framework to handle regression with censored covariate where the response can be different types and the censoring can be random or subject to (constant) detection limits. Multiple imputation is a popular technique to handle missing data that requires compatibility between the imputation model and the substantive model to obtain valid estimates. With censored covariate, we propose a novel multiple imputation‐based approach, namely, the semiparametric two‐step importance sampling imputation (STISI) method, to impute the censored covariate. Specifically, STISI imputes the missing covariate from a semiparametric accelerated failure time model conditional on fully observed covariates (Step 1) with the acceptance probability derived from the substantive model (Step 2). The 2‐step procedure automatically ensures compatibility and takes full advantage of the relaxed semiparametric assumption in the imputation. Extensive simulations demonstrate that the STISI method yields valid estimates in all scenarios and outperforms some existing methods that are commonly used in practice. We apply STISI on data from the Age‐related Eye Disease Study, to investigate the association between the progression time of the less severe eye and that of the more severe eye. We also illustrate the method by analyzing the urine arsenic data for patients from National Health and Nutrition Examination Survey (2003‐2004) where the response is binary and 1 covariate is subject to detection limit.  相似文献   

13.
Longitudinal cohort studies often collect both repeated measurements of longitudinal outcomes and times to clinical events whose occurrence precludes further longitudinal measurements. Although joint modeling of the clinical events and the longitudinal data can be used to provide valid statistical inference for target estimands in certain contexts, the application of joint models in medical literature is currently rather restricted because of the complexity of the joint models and the intensive computation involved. We propose a multiple imputation approach to jointly impute missing data of both the longitudinal and clinical event outcomes. With complete imputed datasets, analysts are then able to use simple and transparent statistical methods and standard statistical software to perform various analyses without dealing with the complications of missing data and joint modeling. We show that the proposed multiple imputation approach is flexible and easy to implement in practice. Numerical results are also provided to demonstrate its performance. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

14.
In designed longitudinal studies, information from the same set of subjects are collected repeatedly over time. The longitudinal measurements are often subject to missing data which impose an analytic challenge. We propose a functional multiple imputation approach modeling longitudinal response profiles as smooth curves of time under a functional mixed effects model. We develop a Gibbs sampling algorithm to draw model parameters and imputations for missing values, using a blocking technique for an increased computational efficiency. In an illustrative example, we apply a multiple imputation analysis to data from the Panel Study of Income Dynamics and the Child Development Supplement to investigate the gradient effect of family income on children's health status. Our simulation study demonstrates that this approach performs well under varying modeling assumptions on the time trajectory functions and missingness patterns.  相似文献   

15.
It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model.  相似文献   

16.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
Missing data are ubiquitous in longitudinal studies. In this paper, we propose an imputation procedure to handle dropouts in longitudinal studies. By taking advantage of the monotone missing pattern resulting from dropouts, our imputation procedure can be carried out sequentially, which substantially reduces the computation complexity. In addition, at each step of the sequential imputation, we set up a model selection mechanism that chooses between a parametric model and a nonparametric model to impute each missing observation. Unlike usual model selection procedures that aim at finding a single model fitting the entire data set well, our model selection procedure is customized to find a suitable model for the prediction of each missing observation. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

18.
Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults.  相似文献   

19.
20.
Missing data due to loss to follow-up or intercurrent events are unintended, but unfortunately inevitable in clinical trials. Since the true values of missing data are never known, it is necessary to assess the impact of untestable and unavoidable assumptions about any unobserved data in sensitivity analysis. This tutorial provides an overview of controlled multiple imputation (MI) techniques and a practical guide to their use for sensitivity analysis of trials with missing continuous outcome data. These include δ- and reference-based MI procedures. In δ-based imputation, an offset term, δ, is typically added to the expected value of the missing data to assess the impact of unobserved participants having a worse or better response than those observed. Reference-based imputation draws imputed values with some reference to observed data in other groups of the trial, typically in other treatment arms. We illustrate the accessibility of these methods using data from a pediatric eczema trial and a chronic headache trial and provide Stata code to facilitate adoption. We discuss issues surrounding the choice of δ in δ-based sensitivity analysis. We also review the debate on variance estimation within reference-based analysis and justify the use of Rubin's variance estimator in this setting, since as we further elaborate on within, it provides information anchored inference.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号