首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

2.

Background

Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).

Methods

We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1.

Results

We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model.

Conclusions

This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.
  相似文献   

3.

Background

In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates.

Methods

Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist.

Results

CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach.

Conclusions

Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions.  相似文献   

4.
BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.  相似文献   

5.

Background  

Missing data is classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying the most appropriate analysis. The first aim was to compare different methods for identifying this missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate whether the reminder-response data can be utilised to help identify the missing data mechanism.  相似文献   

6.
ObjectiveMissing data due to study dropout is common in weight loss trials and several statistical methods exist to account for it. The aim of this study was to identify methods in the literature and to compare the effects of methods of analysis using simulated data sets.MethodsLiterature was obtained for a 1-y period to identify analytical methods used in reporting weight loss trials. A comparison of methods with large or small between-group weight loss, and missing data that was, or was not, missing randomly was conducted in simulated data sets based on previous research.ResultsTwenty-seven studies, some with multiple analyses, were retrieved. Complete case analysis (n = 17), last observation carried forward (n = 6), baseline carried forward (n = 4), maximum likelihood (n = 6), and multiple imputation (n = 2) were the common methods of accounting for missing data. When comparing methods on simulated data, all demonstrated a significant effect when the between-group weight loss was large (P < 0.001, interaction term) regardless of whether the data was missing completely at random. When the weight loss interaction was small, the method used for analysis gave considerably different results with mixed models (P = 0.180) and multiple imputations (P = 0.125) closest to the full data model (P = 0.033).ConclusionThe simulation analysis showed that when data were not missing at random, treatment effects were small, and the amount of missing data was substantial, the analysis method had an effect on the significance of the outcome. Careful attention must be paid when analyzing or appraising studies with missing data and small effects to ensure appropriate conclusions are drawn.  相似文献   

7.
ObjectiveMissing data are a pervasive problem, often leading to bias in complete records analysis (CRA). Multiple imputation (MI) via chained equations is one solution, but its use in the presence of interactions is not straightforward.Study Design and SettingWe simulated data with outcome Y dependent on binary explanatory variables X and Z and their interaction XZ. Six scenarios were simulated (Y continuous and binary, each with no interaction, a weak and a strong interaction), under five missing data mechanisms. We use directed acyclic graphs to identify when CRA and MI would each be unbiased. We evaluate the performance of CRA, MI without interactions, MI including all interactions, and stratified imputation. We also illustrated these methods using a simple example from the National Child Development Study (NCDS).ResultsMI excluding interactions is invalid and resulted in biased estimates and low coverage. When XZ was zero, MI excluding interactions gave unbiased estimates but overcoverage. MI including interactions and stratified MI gave equivalent, valid inference in all cases. In the NCDS example, MI excluding interactions incorrectly concluded there was no evidence for an important interaction.ConclusionsEpidemiologists carrying out MI should ensure that their imputation model(s) are compatible with their analysis model.  相似文献   

8.
ObjectiveTo assess the added value of multiple imputation (MI) of missing repeated outcomes measures in longitudinal data sets analyzed with linear mixed-effects (LME) models.Study Design and SettingData were used from a trial on the effects of Rosuvastatin on rate of change in carotid intima-media thickness (CIMT). The reference treatment effect was derived from a complete data set. Scenarios and proportions of missing values in CIMT measurements were applied and LME analyses were used before and after MI. The added value of MI, in terms of bias and precision, was assessed using the mean-squared error (MSE) of the treatment effects and coverage of the 95% confidence interval.ResultsThe reference treatment effect was ?0.0177 mm/y. The MSEs for LME analysis without and with MI were similar in scenarios with up to 40% missing values. Coverage was large in all scenarios and was similar for LME with and without MI.ConclusionOur study empirically shows that MI of missing end point data before LME analyses does not increase precision in the estimated rate of change in the end point. Hence, MI had no added value in this setting and standard LME modeling remains the method of choice.  相似文献   

9.
Introduction

For the analysis of clinical effects, multiple imputation (MI) of missing data were shown to be unnecessary when using longitudinal linear mixed-models (LLM). It remains unclear whether this also applies to trial-based economic evaluations. Therefore, this study aimed to assess whether MI is required prior to LLM when analyzing longitudinal cost and effect data.

Methods

Two-thousand complete datasets were simulated containing five time points. Incomplete datasets were generated with 10, 25, and 50% missing data in follow-up costs and effects, assuming a Missing At Random (MAR) mechanism. Six different strategies were compared using empirical bias (EB), root-mean-squared error (RMSE), and coverage rate (CR). These strategies were: LLM alone (LLM) and MI with LLM (MI-LLM), and, as reference strategies, mean imputation with LLM (M-LLM), seemingly unrelated regression alone (SUR-CCA), MI with SUR (MI-SUR), and mean imputation with SUR (M-SUR).

Results

For costs and effects, LLM, MI-LLM, and MI-SUR performed better than M-LLM, SUR-CCA, and M-SUR, with smaller EBs and RMSEs as well as CRs closers to nominal levels. However, even though LLM, MI-LLM and MI-SUR performed equally well for effects, MI-LLM and MI-SUR were found to perform better than LLM for costs at 10 and 25% missing data. At 50% missing data, all strategies resulted in relatively high EBs and RMSEs for costs.

Conclusion

LLM should be combined with MI when analyzing trial-based economic evaluation data. MI-SUR is more efficient and can also be used, but then an average intervention effect over time cannot be estimated.

  相似文献   

10.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

11.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

12.
We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.  相似文献   

13.

Background

When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).

Methods

Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1–0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.

Results

Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.

Conclusions

In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.
  相似文献   

14.
Missing values are common in epidemiological data, and yet their possible effect on the results of the investigation is seldom quantified despite the fact that they are a likely source of bias. This paper describes a simple method, based on odds ratios, for assessing whether missing values are likely to cause bias, and for quantifying the magnitude of any such bias. The method is applied to the Scottish Heart Health Study, a study of risk factors for coronary heart disease amongst 10,359 men and women. It is found that, although no bias is apparent in the male subjects, females with missing blood samples are twice as likely to have a history of myocardial infarction than other women. Missing blood samples are shown to be associated with increased body mass index and difficult peripheral veins. The bias caused by the missing female blood samples is considerable, with estimated errors of between 4% and 12% in the mean values of total cholesterol, HDL-cholesterol, triglycerides, creatinine, uric acid and fibrinogen.  相似文献   

15.
The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions.  相似文献   

16.
Background and ObjectivesAs a result of the development of sophisticated techniques, such as multiple imputation, the interest in handling missing data in longitudinal studies has increased enormously in past years. Within the field of longitudinal data analysis, there is a current debate on whether it is necessary to use multiple imputations before performing a mixed-model analysis to analyze the longitudinal data. In the current study this necessity is evaluated.Study Design and SettingThe results of mixed-model analyses with and without multiple imputation were compared with each other. Four data sets with missing values were created—one data set with missing completely at random, two data sets with missing at random, and one data set with missing not at random). In all data sets, the relationship between a continuous outcome variable and two different covariates were analyzed: a time-independent dichotomous covariate and a time-dependent continuous covariate.ResultsAlthough for all types of missing data, the results of the mixed-model analysis with or without multiple imputations were slightly different, they were not in favor of one of the two approaches. In addition, repeating the multiple imputations 100 times showed that the results of the mixed-model analysis with multiple imputation were quite unstable.ConclusionIt is not necessary to handle missing data using multiple imputations before performing a mixed-model analysis on longitudinal data.  相似文献   

17.
OBJECTIVES: The aim of this study was to quantify bias from a partially ecologic analysis due to (i) model misspecification and (ii) an unmeasured confounder, considering various scenarios that may occur in occupational and environmental epidemiology. A study with an aggregate exposure variable, PE, but with outcome, group membership, and covariates assessed individually is partially ecologic. In this paper, PE is the proportion exposed; PE can vary across geographic areas or occupational groups. METHODS: Several hypothetical scenarios were considered, varying the baseline risk, the exposure effect, the exposure distribution across groups, the impact of the (unmeasured) confounder, and the confounder distribution across groups. First, confounding within groups was introduced. Thereafter, confounding between groups was introduced by co-varying PE and the confounder prevalence across the groups. The expected odds ratio (exposed versus unexposed) was calculated in two alternative models, the logistic regression and linear odds models, both with PE as the independent variable. Moreover, empirical data on noise exposure and sleeping disturbances were analyzed. RESULTS: Compared with the logistic regression model, the linear odds model yielded a markedly less biased odds ratio (OR) when the outcome was rare (< or = 5% baseline risk). Confounding within groups resulted in marginal bias, whereas confounding between groups resulted in more pronounced bias. CONCLUSIONS: A logistic regression analysis, with PE as an independent variable, can yield substantial model misspecification bias. By contrast, the linear odds model is valid when the outcome is rare. Confounding between groups should be of more concern than confounding within groups in partially ecologic analyses.  相似文献   

18.
Missing outcome data are commonly encountered in randomized controlled trials and hence may need to be addressed in a meta‐analysis of multiple trials. A common and simple approach to deal with missing data is to restrict analysis to individuals for whom the outcome was obtained (complete case analysis). However, estimated treatment effects from complete case analyses are potentially biased if informative missing data are ignored. We develop methods for estimating meta‐analytic summary treatment effects for continuous outcomes in the presence of missing data for some of the individuals within the trials. We build on a method previously developed for binary outcomes, which quantifies the degree of departure from a missing at random assumption via the informative missingness odds ratio. Our new model quantifies the degree of departure from missing at random using either an informative missingness difference of means or an informative missingness ratio of means, both of which relate the mean value of the missing outcome data to that of the observed data. We propose estimating the treatment effects, adjusted for informative missingness, and their standard errors by a Taylor series approximation and by a Monte Carlo method. We apply the methodology to examples of both pairwise and network meta‐analysis with multi‐arm trials. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

19.

Background

Missing values are commonly encountered on the Mini Mental State Examination (MMSE), particularly when administered to frail older people. This presents challenges for MMSE scoring in research settings. We sought to describe missingness in MMSEs administered in long-term-care facilities (LTCF) and to compare and contrast approaches to dealing with missing items.

Methods

As part of the Care and Construction project in Nova Scotia, Canada, LTCF residents completed an MMSE. Different methods of dealing with missing values (e.g., use of raw scores, raw scores/number of items attempted, scale-level multiple imputation [MI], and blended approaches) are compared to item-level MI.

Results

The MMSE was administered to 320 residents living in 23 LTCF. The sample was predominately female (73%), and 38% of participants were aged >85 years. At least one item was missing from 122 (38.2%) of the MMSEs. Data were not Missing Completely at Random (MCAR), χ2 (1110) = 1,351, p < 0.001. Using raw scores for those missing <6 items in combination with scale-level MI resulted in the regression coefficients and standard errors closest to item-level MI.

Conclusions

Patterns of missing items often suggest systematic problems, such as trouble with manual dexterity, literacy, or visual impairment. While these observations may be relatively easy to take into account in clinical settings, non-random missingness presents challenges for research and must be considered in statistical analyses. We present suggestions for dealing with missing MMSE data based on the extent of missingness and the goal of analyses.  相似文献   

20.
Missing data due to loss to follow-up or intercurrent events are unintended, but unfortunately inevitable in clinical trials. Since the true values of missing data are never known, it is necessary to assess the impact of untestable and unavoidable assumptions about any unobserved data in sensitivity analysis. This tutorial provides an overview of controlled multiple imputation (MI) techniques and a practical guide to their use for sensitivity analysis of trials with missing continuous outcome data. These include δ- and reference-based MI procedures. In δ-based imputation, an offset term, δ, is typically added to the expected value of the missing data to assess the impact of unobserved participants having a worse or better response than those observed. Reference-based imputation draws imputed values with some reference to observed data in other groups of the trial, typically in other treatment arms. We illustrate the accessibility of these methods using data from a pediatric eczema trial and a chronic headache trial and provide Stata code to facilitate adoption. We discuss issues surrounding the choice of δ in δ-based sensitivity analysis. We also review the debate on variance estimation within reference-based analysis and justify the use of Rubin's variance estimator in this setting, since as we further elaborate on within, it provides information anchored inference.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号