首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Propensity score models are frequently used to estimate causal effects in observational studies. One unresolved issue in fitting these models is handling missing values in the propensity score model covariates. As these models usually contain a large set of covariates, using only individuals with complete data significantly decreases the sample size and statistical power. Several missing data imputation approaches have been proposed, including multiple imputation (MI), MI with missingness pattern (MIMP), and treatment mean imputation. Generalized boosted modeling (GBM), which is a nonparametric approach to estimate propensity scores, can automatically handle missingness in the covariates. Although the performance of MI, MIMP, and treatment mean imputation have previously been compared for binary treatments, they have not been compared for continuous exposures or with single imputation and GBM. We compared these approaches in estimating the generalized propensity score (GPS) for a continuous exposure in both a simulation study and in empirical data. Using GBM with the incomplete data to estimate the GPS did not perform well in the simulation. Missing values should be imputed before estimating propensity scores using GBM or any other approach for estimating the GPS.  相似文献   

2.
In many observational studies, analysts estimate causal effects using propensity scores, e.g. by matching, sub-classifying, or inverse probability weighting based on the scores. Estimation of propensity scores is complicated when some values of the covariates are missing. Analysts can use multiple imputation to create completed data sets from which propensity scores can be estimated. We propose a general location mixture model for imputations that assumes that the control units are a latent mixture of (i) units whose covariates are drawn from the same distributions as the treated units' covariates and (ii) units whose covariates are drawn from different distributions. This formulation reduces the influence of control units outside the treated units' region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations. In turn, this can result in more reliable estimates of propensity scores and better balance in the true covariate distributions when matching or sub-classifying. We illustrate the benefits of the latent class modeling approach with simulations and with an observational study of the effect of breast feeding on children's cognitive abilities.  相似文献   

3.
Overcoming bias due to confounding and missing data is challenging when analyzing observational data. Propensity scores are commonly used to account for the first problem and multiple imputation for the latter. Unfortunately, it is not known how best to proceed when both techniques are required. We investigate whether two different approaches to combining propensity scores and multiple imputation (Across and Within) lead to differences in the accuracy or precision of exposure effect estimates. Both approaches start by imputing missing values multiple times. Propensity scores are then estimated for each resulting dataset. Using the Across approach, the mean propensity score across imputations for each subject is used in a single subsequent analysis. Alternatively, the Within approach uses propensity scores individually to obtain exposure effect estimates in each imputation, which are combined to produce an overall estimate. These approaches were compared in a series of Monte Carlo simulations and applied to data from the British Society for Rheumatology Biologics Register. Results indicated that the Within approach produced unbiased estimates with appropriate confidence intervals, whereas the Across approach produced biased results and unrealistic confidence intervals. Researchers are encouraged to implement the Within approach when conducting propensity score analyses with incomplete data.  相似文献   

4.
Pattern‐mixture models provide a general and flexible framework for sensitivity analyses of nonignorable missing data. The placebo‐based pattern‐mixture model (Little and Yau, Biometrics 1996; 52 :1324–1333) treats missing data in a transparent and clinically interpretable manner and has been used as sensitivity analysis for monotone missing data in longitudinal studies. The standard multiple imputation approach (Rubin, Multiple Imputation for Nonresponse in Surveys, 1987) is often used to implement the placebo‐based pattern‐mixture model. We show that Rubin's variance estimate of the multiple imputation estimator of treatment effect can be overly conservative in this setting. As an alternative to multiple imputation, we derive an analytic expression of the treatment effect for the placebo‐based pattern‐mixture model and propose a posterior simulation or delta method for the inference about the treatment effect. Simulation studies demonstrate that the proposed methods provide consistent variance estimates and outperform the imputation methods in terms of power for the placebo‐based pattern‐mixture model. We illustrate the methods using data from a clinical study of major depressive disorders. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for some subjects. In this article, we investigate methods for multinomial logistic regression with missing outcome data, including a bootstrap hot deck multiple imputation (BHMI), simple inverse probability weighted (SIPW), augmented inverse probability weighted (AIPW), and expected estimating equation (EEE) estimators. These methods are important approaches for missing data regression. The BHMI modifies the standard hot deck multiple imputation method such that it can provide valid confidence interval estimation. Under the situation when the covariates are discrete, the SIPW, AIPW, and EEE estimators are numerically identical. When the covariates are continuous, nonparametric smoothers can be applied to estimate the selection probabilities and the estimating scores. These methods perform similarly. Extensive simulations show that all of these methods yield unbiased estimators while the complete-case (CC) analysis can be biased if the missingness depends on the observed data. Our simulations also demonstrate that these methods can gain substantial efficiency compared with the CC analysis. The methods are applied to a colorectal cancer study in which cancer subtype data are missing among some study individuals.  相似文献   

6.
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single‐level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost‐effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing‐at‐random clustered data scenarios were simulated following a full‐factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed‐effects multiple imputation and too low following single‐level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

7.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

8.
ObjectivesRegardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument.Study Design and SettingReal-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters.ResultsMean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score.ConclusionWe recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.  相似文献   

9.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

10.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
In this paper, we consider fitting semiparametric additive hazards models for case‐cohort studies using a multiple imputation approach. In a case‐cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing‐at‐random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions.  相似文献   

13.
《Statistics in medicine》2017,36(6):1014-1028
Breast cancers are clinically heterogeneous based on tumor markers. The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program provides baseline data on these tumor markers for reporting cancer burden and trends over time in the US general population. These tumor markers, however, are often prone to missing observations. In particular, estrogen receptor (ER) status, a key biomarker in the study of breast cancer, has been collected since 1992 but historically was not well‐reported, with missingness rates as high as 25% in early years. Previous methods used to correct estimates of breast cancer incidence or ER‐related odds or prevalence ratios for unknown ER status have relied on a missing‐at‐random (MAR) assumption. In this paper, we explore the sensitivity of these key estimates to departures from MAR. We develop a predictive mean matching procedure that can be used to multiply impute missing ER status under either an MAR or a missing not at random assumption and apply it to the SEER breast cancer data (1992–2012). The imputation procedure uses the predictive power of the rich set of covariates available in the SEER registry while also allowing us to investigate the impact of departures from MAR. We find some differences in inference under the two assumptions, although the magnitude of differences tends to be small. For the types of analyses typically of primary interest, we recommend imputing SEER breast cancer biomarkers under an MAR assumption, given the small apparent differences under MAR and missing not at random assumptions. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

14.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

15.
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear. We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H0(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H0(T), we approximate it by the Nelson–Aalen estimator of H(T) or estimate it by Cox regression. We compare the methods using simulation studies. We find that using logT biases covariate‐outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson–Aalen estimator of H(T) in the imputation model. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

16.
A case study is presented assessing the impact of missing data on the analysis of daily diary data from a study evaluating the effect of a drug for the treatment of insomnia. The primary analysis averaged daily diary values for each patient into a weekly variable. Following the commonly used approach, missing daily values within a week were ignored provided there was a minimum number of diary reports (i.e., at least 4). A longitudinal model was then fit with treatment, time, and patient‐specific effects. A treatment effect at a pre‐specified landmark time was obtained from the model. Weekly values following dropout were regarded as missing, but intermittent daily missing values were obscured. Graphical summaries and tables are presented to characterize the complex missing data patterns. We use multiple imputation for daily diary data to create completed data sets so that exactly 7 daily diary values contribute to each weekly patient average. Standard analysis methods are then applied for landmark analysis of the completed data sets, and the resulting estimates are combined using the standard multiple imputation approach. The observed data are subject to digit heaping and patterned responses (e.g., identical values for several consecutive days), which makes accurate modeling of the response data difficult. Sensitivity analyses under different modeling assumptions for the data were performed, along with pattern mixture models assessing the sensitivity to the missing at random assumption. The emphasis is on graphical displays and computational methods that can be implemented with general‐purpose software. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
Individual participant data meta‐analyses (IPD‐MA) are increasingly used for developing and validating multivariable (diagnostic or prognostic) risk prediction models. Unfortunately, some predictors or even outcomes may not have been measured in each study and are thus systematically missing in some individual studies of the IPD‐MA. As a consequence, it is no longer possible to evaluate between‐study heterogeneity and to estimate study‐specific predictor effects, or to include all individual studies, which severely hampers the development and validation of prediction models. Here, we describe a novel approach for imputing systematically missing data and adopt a generalized linear mixed model to allow for between‐study heterogeneity. This approach can be viewed as an extension of Resche‐Rigon's method (Stat Med 2013), relaxing their assumptions regarding variance components and allowing imputation of linear and nonlinear predictors. We illustrate our approach using a case study with IPD‐MA of 13 studies to develop and validate a diagnostic prediction model for the presence of deep venous thrombosis. We compare the results after applying four methods for dealing with systematically missing predictors in one or more individual studies: complete case analysis where studies with systematically missing predictors are removed, traditional multiple imputation ignoring heterogeneity across studies, stratified multiple imputation accounting for heterogeneity in predictor prevalence, and multilevel multiple imputation (MLMI) fully accounting for between‐study heterogeneity. We conclude that MLMI may substantially improve the estimation of between‐study heterogeneity parameters and allow for imputation of systematically missing predictors in IPD‐MA aimed at the development and validation of prediction models. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald‐type statistics and weighted likelihood‐ratio tests are proposed and evaluated in simulation studies. The Wald‐based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

19.
In many health services applications, research to determine the effectiveness of a particular treatment cannot be carried out using a controlled clinical trial. In settings such as these, observational studies must be used. Propensity score methods are useful tools to employ in order to balance the distribution of covariates between treatment groups and hence reduce the potential bias in treatment effect estimates in observational studies. A challenge in many health services research studies is the presence of missing data among the covariates that need to be balanced. In this paper, we compare three simple propensity models using data that examine the effectiveness of self-monitoring of blood glucose (SMBG) in reducing hemoglobin A1c in a cohort of 10,566 type 2 diabetics. The first propensity score model uses only subjects with complete case data (n=6,687), the second incorporates missing value indicators into the model, and the third fits separate propensity scores for each pattern of missing data. We compare the results of these methods and find that incorporating missing data into the propensity score model reduces the estimated effect of SMBG on hemoglobin A1c by more than 10%, although this reduction was not clinically significant. In addition, beginning with the complete data, we artificially introduce missing data using a nonignorable missing data mechanism and compare treatment effect estimates using the three propensity score methods and a simple analysis of covariance (ANCOVA) method. In these analyses, we find that the complete case analysis and the ANCOVA method both perform poorly, the missing value indicator model performs moderately well, and the pattern mixture model performs even better in estimating the original treatment effect observed in thecomplete data prior to the introduction of artificial missing data. We conclude that in observational studies onemust not only adjust for potentially confounding variables using methods such as propensity scores, but oneshould also account for missing data in these models in order to allow for causal inference more appropriately to be applied.  相似文献   

20.
The standard multiple imputation technique focuses on parameter estimation. In this study, we describe a method for conducting score tests following multiple imputation. As an important application, we use the Cochran-Mantel-Haenszel (CMH) test as a score test and compare the proposed multiple imputation method with a method based on the Wilson-Hilferty transformation of the CMH statistic. We show that the proposed multiple imputation method preserves the nominal significance level for three types of alternative hypotheses, whereas that based on the Wilson-Hilferty transformation inflates type I error for the “row means differ” and “general association” alternative hypotheses. Moreover, we find that this type I error inflation worsens as the amount of missing data increases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号