首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The missing-indicator method and conditional logistic regression have been recommended as alternative approaches for data analysis in matched case-control studies with missing exposure values. The authors evaluated the performance of the two methods using Monte Carlo simulation. Data were generated from a 1:m matched design based on McNemar's 2 x 2 tables with four scenarios for missing values: completely-at-random, case-dependent, exposure-dependent, and case/exposure-dependent. In their analysis, the authors used conditional logistic regression for complete pairs and the missing-indicator method for all pairs. For 1:1 matched studies, given no confounding between exposure and disease, the two methods yielded unbiased estimates. Otherwise, conditional logistic regression produced unbiased estimates with empirical confidence interval coverage similar to nominal coverage under the first three missing-value scenarios, whereas the missing-indicator method produced slightly more bias and lower confidence interval coverage. An increased number of matched controls was associated with slightly more bias and lower confidence interval coverage. Under the case/exposure-dependent missing-value scenario, neither method performed satisfactorily; this indicates the need for more sophisticated statistical methods for handling such missing values. Overall, compared with the missing-indicator method, conditional logistic regression provided a slight advantage in terms of bias and coverage probability, at the cost of slightly reduced statistical power and efficiency.  相似文献   

2.
Individual participant data meta‐analyses (IPD‐MA) are increasingly used for developing and validating multivariable (diagnostic or prognostic) risk prediction models. Unfortunately, some predictors or even outcomes may not have been measured in each study and are thus systematically missing in some individual studies of the IPD‐MA. As a consequence, it is no longer possible to evaluate between‐study heterogeneity and to estimate study‐specific predictor effects, or to include all individual studies, which severely hampers the development and validation of prediction models. Here, we describe a novel approach for imputing systematically missing data and adopt a generalized linear mixed model to allow for between‐study heterogeneity. This approach can be viewed as an extension of Resche‐Rigon's method (Stat Med 2013), relaxing their assumptions regarding variance components and allowing imputation of linear and nonlinear predictors. We illustrate our approach using a case study with IPD‐MA of 13 studies to develop and validate a diagnostic prediction model for the presence of deep venous thrombosis. We compare the results after applying four methods for dealing with systematically missing predictors in one or more individual studies: complete case analysis where studies with systematically missing predictors are removed, traditional multiple imputation ignoring heterogeneity across studies, stratified multiple imputation accounting for heterogeneity in predictor prevalence, and multilevel multiple imputation (MLMI) fully accounting for between‐study heterogeneity. We conclude that MLMI may substantially improve the estimation of between‐study heterogeneity parameters and allow for imputation of systematically missing predictors in IPD‐MA aimed at the development and validation of prediction models. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
Review: a gentle introduction to imputation of missing values   总被引:1,自引:0,他引:1  
In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.  相似文献   

4.
Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset.  相似文献   

5.
Although missing outcome data are an important problem in randomized trials and observational studies, methods to address this issue can be difficult to apply. Using simulated data, the authors compared 3 methods to handle missing outcome data: 1) complete case analysis; 2) single imputation; and 3) multiple imputation (all 3 with and without covariate adjustment). Simulated scenarios focused on continuous or dichotomous missing outcome data from randomized trials or observational studies. When outcomes were missing at random, single and multiple imputations yielded unbiased estimates after covariate adjustment. Estimates obtained by complete case analysis with covariate adjustment were unbiased as well, with coverage close to 95%. When outcome data were missing not at random, all methods gave biased estimates, but handling missing outcome data by means of 1 of the 3 methods reduced bias compared with a complete case analysis without covariate adjustment. Complete case analysis with covariate adjustment and multiple imputation yield similar estimates in the event of missing outcome data, as long as the same predictors of missingness are included. Hence, complete case analysis with covariate adjustment can and should be used as the analysis of choice more often. Multiple imputation, in addition, can accommodate the missing-not-at-random scenario more flexibly, making it especially suited for sensitivity analyses.  相似文献   

6.
Propensity score models are frequently used to estimate causal effects in observational studies. One unresolved issue in fitting these models is handling missing values in the propensity score model covariates. As these models usually contain a large set of covariates, using only individuals with complete data significantly decreases the sample size and statistical power. Several missing data imputation approaches have been proposed, including multiple imputation (MI), MI with missingness pattern (MIMP), and treatment mean imputation. Generalized boosted modeling (GBM), which is a nonparametric approach to estimate propensity scores, can automatically handle missingness in the covariates. Although the performance of MI, MIMP, and treatment mean imputation have previously been compared for binary treatments, they have not been compared for continuous exposures or with single imputation and GBM. We compared these approaches in estimating the generalized propensity score (GPS) for a continuous exposure in both a simulation study and in empirical data. Using GBM with the incomplete data to estimate the GPS did not perform well in the simulation. Missing values should be imputed before estimating propensity scores using GBM or any other approach for estimating the GPS.  相似文献   

7.
Missing values, common in epidemiologic studies, are a major issue in obtaining valid estimates. Simulation studies have suggested that multiple imputation is an attractive method for imputing missing values, but it is relatively complex and requires specialized software. For each of 28 studies in the Asia Pacific Cohort Studies Collaboration, a comparison of eight imputation procedures (unconditional and conditional mean, multiple hot deck, expectation maximization, and four different approaches to multiple imputation) and the naive, complete participant analysis are presented in this paper. Criteria used for comparison were the mean and standard deviation of total cholesterol and the estimated coronary mortality hazard ratio for a one-unit increase in cholesterol. Further sensitivity analyses allowed for systematic over- or underestimation of cholesterol. For 22 studies for which less than 10% of the values for cholesterol were missing, and for the pooled Asia Pacific Cohort Studies Collaboration, all methods gave similar results. For studies with roughly 10-60% missing values, clear differences existed between the methods, in which case past research suggests that multiple imputation is the method of choice. For two studies with over 60% missing values, no imputation method seemed to be satisfactory.  相似文献   

8.
《Value in health》2022,25(9):1654-1662
ObjectivesCost-effectiveness analysis (CEA) alongside randomized controlled trials often relies on self-reported multi-item questionnaires that are invariably prone to missing item-level data. The purpose of this study is to review how missing multi-item questionnaire data are handled in trial-based CEAs.MethodsWe searched the National Institute for Health Research journals to identify within-trial CEAs published between January 2016 and April 2021 using multi-item instruments to collect costs and quality of life (QOL) data. Information on missing data handling and methods, with a focus on the level and type of imputation, was extracted.ResultsA total of 87 trial-based CEAs were included in the review. Complete case analysis or available case analysis and multiple imputation (MI) were the most popular methods, selected by similar numbers of studies, to handle missing costs and QOL in base-case analysis. Nevertheless, complete case analysis or available case analysis dominated sensitivity analysis. Once imputation was chosen, missing costs were widely imputed at item-level via MI, whereas missing QOL was usually imputed at the more aggregated time point level during the follow-up via MI.ConclusionsMissing costs and QOL tend to be imputed at different levels of missingness in current CEAs alongside randomized controlled trials. Given the limited information provided by included studies, the impact of applying different imputation methods at different levels of aggregation on CEA decision making remains unclear.  相似文献   

9.
Missing data occur frequently in meta-analysis. Reviewers inevitably face decisions about how to handle missing data, especially when predictors in a model of effect size are missing from some of the identified studies. Commonly used methods for missing data such as complete case analysis and mean substitution often yield biased estimates. This article briefly reviews the particular problems missing predictors cause in a meta-analysis, discusses the properties of commonly used missing data methods, and provides suggestions for ways to handle missing predictors when estimating effect size models. Maximum likelihood methods for multivariate normal data and multiple imputation hold the most promise for handling missing predictors in meta-analysis. These two model-based methods apply to a broad set of data situations, are based on sound statistical theory, and utilize all information available to obtain efficient estimators.  相似文献   

10.
Several approaches exist for handling missing covariates in the Cox proportional hazards model. The multiple imputation (MI) is relatively easy to implement with various software available and results in consistent estimates if the imputation model is correct. On the other hand, the fully augmented weighted estimators (FAWEs) recover a substantial proportion of the efficiency and have the doubly robust property. In this paper, we compare the FAWEs and the MI through a comprehensive simulation study. For the MI, we consider the multiple imputation by chained equation and focus on two imputation methods: Bayesian linear regression imputation and predictive mean matching. Simulation results show that the imputation methods can be rather sensitive to model misspecification and may have large bias when the censoring time depends on the missing covariates. In contrast, the FAWEs allow the censoring time to depend on the missing covariates and are remarkably robust as long as getting either the conditional expectations or the selection probability correct due to the doubly robust property. The comparison suggests that the FAWEs show the potential for being a competitive and attractive tool for tackling the analysis of survival data with missing covariates. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
OBJECTIVE: To evaluate the effects of missing data on analyses of data from trauma databases, and to verify whether commonly used techniques for handling missing data work well in theses settings. STUDY DESIGN AND SETTING: Measures of trauma severity such as the Pre-Hospital Index (PHI) are used for triage and the evaluation of trauma care. As conditions of trauma patients can rapidly change over time, estimating the change in PHI from the arrival at the emergency room to hospital admission is important. We used both simulated and real data to investigate the estimation of PHI data when some data are missing. Techniques compared include complete case analysis, single imputation, and multiple imputation. RESULTS: It is well known that complete case analyses and single imputation methods often lead to highly misleading results that can be corrected by multiple imputation, an increasingly popular method for missing data situations. In practice, unverifiable assumptions may not hold, meaning that it may not be possible to draw definitive conclusions from any of the methods. CONCLUSION: Great care is required whenever missing data arises. This is especially true in trauma databases, which often have much missing data and where the data may not missing at random.  相似文献   

12.
In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure–disease association studies are therefore often based on nested case–control or case–cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case–control or case–cohort study plus the remainder of the cohort as a full‐cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub‐studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full‐cohort information in the analysis of nested case–control and case–cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter‐matching in nested case–control studies and a weighted analysis for case–cohort studies, both of which use some full‐cohort information. Approximate imputation models perform well except when there are interactions or non‐linear terms in the outcome model, where imputation using rejection sampling works well. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

13.
BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.  相似文献   

14.
ObjectiveMissing indicator method (MIM) and complete case analysis (CC) are frequently used to handle missing confounder data. Using empirical data, we demonstrated the degree and direction of bias in the effect estimate when using these methods compared with multiple imputation (MI).Study Design and SettingFrom a cohort study, we selected an exposure (marital status), outcome (depression), and confounders (age, sex, and income). Missing values in “income” were created according to different patterns of missingness: missing values were created completely at random and depending on exposure and outcome values. Percentages of missing values ranged from 2.5% to 30%.ResultsWhen missing values were completely random, MIM gave an overestimation of the odds ratio, whereas CC and MI gave unbiased results. MIM and CC gave under- or overestimations when missing values depended on observed values. Magnitude and direction of bias depended on how the missing values were related to exposure and outcome. Bias increased with increasing percentage of missing values.ConclusionMIM should not be used in handling missing confounder data because it gives unpredictable bias of the odds ratio even with small percentages of missing values. CC can be used when missing values are completely random, but it gives loss of statistical power.  相似文献   

15.
The problem of missing values has increasingly being recognized in epidemiology. New methods allow for the analysis of missing data that can provide valid estimates of epidemiological quantities of interest. The GISSI-Prevenzione study was aimed to reliably assess the long-term relationship between the consumption of foods typical of the Mediterranean diet and the risk of mortality amongst 11,323 Italians with prior myocardial infarction. Food intake frequencies were recorded repeatedly over the 4.5 years of follow-up and missing values affected each food variable at increasing rates over the course of the study. Comparisons were made between the results obtained from the analysis of the complete data and those obtained after imputing the missing data with simple imputation methods and with various implementations of the multiple imputation (MI) method. MI appeared to best address the issue of missing data on the food intake frequencies, preserving the observed distributions and relationships between variables whilst producing plausible estimates of variability. Given its theoretical properties and flexibility to different types of data, MI is more likely to provide valid estimates, compared to complete data analysis and imputation by simple methods, and is thus worthy of wider consideration amongst epidemiological researchers.  相似文献   

16.
What is the influence of various methods of handling missing data (complete case analyses, single imputation within and over trials, and multiple imputations within and over trials) on the subgroup effects of individual patient data meta-analyses? An empirical data set was used to compare these five methods regarding the subgroup results. Logistic regression analyses were used to determine interaction effects (regression coefficients, standard errors, and p values) between subgrouping variables and treatment. Stratified analyses were performed to determine the effects in subgroups (rate ratios, rate differences, and their 95% confidence intervals). Imputation over trials resulted in different regression coefficients and standard errors of the interaction term as compared with imputation within trials and complete case analyses. Significant interaction effects were found for complete case analyses and imputation within trials, whereas imputation over trials often showed no significant interaction effect. Imputation of missing data over trials might lead to bias, because association of covariates might differ across the included studies. Therefore, despite the gain in statistical power, imputation over trials is not recommended. In the authors' empirical example, imputation within trials appears to be the most appropriate approach of handling missing data in individual patient data meta-analyses.  相似文献   

17.
The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions.  相似文献   

18.
ObjectiveTo illustrate the sequence of steps needed to develop and validate a clinical prediction model, when missing predictor values have been multiply imputed.Study Design and SettingWe used data from consecutive primary care patients suspected of deep venous thrombosis (DVT) to develop and validate a diagnostic model for the presence of DVT. Missing values were imputed 10 times with the MICE conditional imputation method. After the selection of predictors and transformations for continuous predictors according to three different methods, we estimated regression coefficients and performance measures.ResultsThe three methods to select predictors and transformations of continuous predictors showed similar results. Rubin's rules could easily be applied to estimate regression coefficients and performance measures, once predictors and transformations were selected.ConclusionWe provide a practical approach for model development and validation with multiply imputed data.  相似文献   

19.
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single‐level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost‐effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing‐at‐random clustered data scenarios were simulated following a full‐factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed‐effects multiple imputation and too low following single‐level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

20.
When collecting patient-level resource use data for statistical analysis, for some patients and in some categories of resource use, the required count will not be observed. Although this problem must arise in most reported economic evaluations containing patient-level data, it is rare for authors to detail how the problem was overcome. Statistical packages may default to handling missing data through a so-called 'complete case analysis', while some recent cost-analyses have appeared to favour an 'available case' approach. Both of these methods are problematic: complete case analysis is inefficient and is likely to be biased; available case analysis, by employing different numbers of observations for each resource use item, generates severe problems for standard statistical inference. Instead we explore imputation methods for generating 'replacement' values for missing data that will permit complete case analysis using the whole data set and we illustrate these methods using two data sets that had incomplete resource use information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号