首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Attrition threatens the internal validity of cohort studies. Epidemiologists use various imputation and weighting methods to limit bias due to attrition. However, the ability of these methods to correct for attrition bias has not been tested. We simulated a cohort of 300 subjects using 500 computer replications to determine whether regression imputation, individual weighting, or multiple imputation is useful to reduce attrition bias. We compared these results to a complete subject analysis. Our logistic regression model included a binary exposure and two confounders. We generated 10, 25, and 40% attrition through three missing data mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), and used four covariance matrices to vary attrition. We compared true and estimated mean odds ratios (ORs), standard deviations (SDs), and coverage. With data MCAR and MAR for all attrition rates, the complete subject analysis produced results at least as valid as those from the imputation and weighting methods. With data MNAR, no method provided unbiased estimates of the OR at attrition rates of 25 or 40%. When observations are not MAR or MCAR, imputation and weighting methods may not effectively reduce attrition bias.  相似文献   

2.
BACKGROUND AND OBJECTIVES: To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. METHODS: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. RESULTS: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. CONCLUSION: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.  相似文献   

3.
Out of sight, not out of mind: strategies for handling missing data   总被引:1,自引:0,他引:1  
OBJECTIVE: To describe and illustrate missing data mechanisms (MCAR, MAR, NMAR) and missing data techniques (MDTs) and offer recommended best practices for addressing missingness. METHOD: We simulated data sets and employed ad hoc MDTs (deletion techniques, mean substitution) and sophisticated MDTs (full information maximum likelihood, Bayesian estimation, multiple imputation) in linear regression analyses. RESULTS: MCAR data yielded unbiased parameter estimates across all MDTs, but loss of power with deletion methods. NMAR results were biased towards larger values and greater significance. Under MAR the sophisticated MDTs returned estimates closer to their original values. CONCLUSION: State-of-the-art, readily available MDTs outperform ad hoc techniques.  相似文献   

4.
During drug development, a key step is the identification of relevant covariates predicting between-subject variations in drug response. The full random effects model (FREM) is one of the full-covariate approaches used to identify relevant covariates in nonlinear mixed effects models. Here we explore the ability of FREM to handle missing (both missing completely at random (MCAR) and missing at random (MAR)) covariate data and compare it to the full fixed-effects model (FFEM) approach, applied either with complete case analysis or mean imputation. A global health dataset (20 421 children) was used to develop a FREM describing the changes of height for age Z-score (HAZ) over time. Simulated datasets (n = 1000) were generated with variable rates of missing (MCAR) covariate data (0%-90%) and different proportions of missing (MAR) data condition on either observed covariates or predicted HAZ. The three methods were used to re-estimate model and compared in terms of bias and precision which showed that FREM had only minor increases in bias and minor loss of precision at increasing percentages of missing (MCAR) covariate data and performed similarly in the MAR scenarios. Conversely, the FFEM approaches either collapsed at 70% of missing (MCAR) covariate data (FFEM complete case analysis) or had large bias increases and loss of precision (FFEM with mean imputation). Our results suggest that FREM is an appropriate approach to covariate modeling for datasets with missing (MCAR and MAR) covariate data, such as in global health studies.  相似文献   

5.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
Standard implementations of multiple imputation (MI) approaches provide unbiased inferences based on an assumption of underlying missing at random (MAR) mechanisms. However, in the presence of missing data generated by missing not at random (MNAR) mechanisms, MI is not satisfactory. Originating in an econometric statistical context, Heckman's model, also called the sample selection method, deals with selected samples using two joined linear equations, termed the selection equation and the outcome equation. It has been successfully applied to MNAR outcomes. Nevertheless, such a method only addresses missing outcomes, and this is a strong limitation in clinical epidemiology settings, where covariates are also often missing. We propose to extend the validity of MI to some MNAR mechanisms through the use of the Heckman's model as imputation model and a two‐step estimation process. This approach will provide a solution that can be used in an MI by chained equation framework to impute missing (either outcomes or covariates) data resulting either from a MAR or an MNAR mechanism when the MNAR mechanism is compatible with a Heckman's model. The approach is illustrated on a real dataset from a randomised trial in patients with seasonal influenza. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.

Background

Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).

Methods

We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1.

Results

We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model.

Conclusions

This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.
  相似文献   

8.
Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given data set, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population by incorporating an appropriately calculated offset termed the “calibrated-δ adjustment.” We describe the derivation of this offset from the population distribution of the incomplete variable and show how, in applications, it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-δ adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general missing not at random missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared with standard MI. Calibrated-δ adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departures from the MAR assumption.  相似文献   

9.
目的 比较在处理多种缺失机制共存的定量纵向缺失数据时,基于对照的模式混合模型(PMM)、重复测量的混合效应模型(MMRM)以及多重填补法(MI)的统计性能。方法 采用Monte Carlo技术模拟产生包含完全随机缺失、随机缺失和非随机缺失中两种或三种缺失机制的定量纵向缺失数据集,评价三类处理方法的统计性能。结果 基于对照的PMM控制Ⅰ类错误率在较低水平,检验效能最低。MMRM和MI的Ⅰ类错误率可控,检验效能高于基于对照的PMM。两组疗效无差异的情况下,所有方法的估计误差相当,基于对照的PMM方法的95%置信区间覆盖率最高;有差异的情况下,各方法受符合其缺失机制假设的缺失比例大小影响。含有非随机缺失数据时,基于对照的PMM基本不高估疗效差异,95%置信区间覆盖率最高,MMRM和MI高估疗效差异,95%置信区间覆盖率较低。所有方法的95%置信区间宽度相当。结论 分析多种缺失机制共存,特别是含有非随机缺失的纵向缺失数据时,MMRM和MI的统计性能有所降低,可采用基于对照的PMM进行敏感性分析,但需要注意其具体假设,防止估计过于保守。  相似文献   

10.
We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.  相似文献   

11.
ObjectiveMissing indicator method (MIM) and complete case analysis (CC) are frequently used to handle missing confounder data. Using empirical data, we demonstrated the degree and direction of bias in the effect estimate when using these methods compared with multiple imputation (MI).Study Design and SettingFrom a cohort study, we selected an exposure (marital status), outcome (depression), and confounders (age, sex, and income). Missing values in “income” were created according to different patterns of missingness: missing values were created completely at random and depending on exposure and outcome values. Percentages of missing values ranged from 2.5% to 30%.ResultsWhen missing values were completely random, MIM gave an overestimation of the odds ratio, whereas CC and MI gave unbiased results. MIM and CC gave under- or overestimations when missing values depended on observed values. Magnitude and direction of bias depended on how the missing values were related to exposure and outcome. Bias increased with increasing percentage of missing values.ConclusionMIM should not be used in handling missing confounder data because it gives unpredictable bias of the odds ratio even with small percentages of missing values. CC can be used when missing values are completely random, but it gives loss of statistical power.  相似文献   

12.
Although missing outcome data are an important problem in randomized trials and observational studies, methods to address this issue can be difficult to apply. Using simulated data, the authors compared 3 methods to handle missing outcome data: 1) complete case analysis; 2) single imputation; and 3) multiple imputation (all 3 with and without covariate adjustment). Simulated scenarios focused on continuous or dichotomous missing outcome data from randomized trials or observational studies. When outcomes were missing at random, single and multiple imputations yielded unbiased estimates after covariate adjustment. Estimates obtained by complete case analysis with covariate adjustment were unbiased as well, with coverage close to 95%. When outcome data were missing not at random, all methods gave biased estimates, but handling missing outcome data by means of 1 of the 3 methods reduced bias compared with a complete case analysis without covariate adjustment. Complete case analysis with covariate adjustment and multiple imputation yield similar estimates in the event of missing outcome data, as long as the same predictors of missingness are included. Hence, complete case analysis with covariate adjustment can and should be used as the analysis of choice more often. Multiple imputation, in addition, can accommodate the missing-not-at-random scenario more flexibly, making it especially suited for sensitivity analyses.  相似文献   

13.
A popular method for analysing repeated‐measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse‐probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of being observed, according to some assumed missingness model. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model. WGEE are consistent when the data are MAR and the dropout model is correctly specified. Imputation methods are consistent when the data are MAR and the imputation model is correctly specified. Recently, doubly robust (DR) methods have been developed. These involve both a model for probability of missingness and an imputation model for the expectation of each missing observation, and are consistent when either is correct. We describe DR GEE, and illustrate their use on simulated data. We also analyse the INITIO randomized clinical trial of HIV therapy allowing for MAR dropout. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
BACKGROUND: Health related quality of life is becoming of greater importance in the medical field. Nevertheless, methodological problems persist, and particularly when it comes to processing missing data on quality of life questionnaires. In fact, this leads to three difficulties: (i) loss of power; (ii) bias; (iii) choice of the most adequate method for treating missing data. Prevention is the best recommendation in order to avoid unanswered questions. Unfortunately, this does not guarantee the absence of missing data. Therefore, the treatment of missing data depends on: i) identification of the missing data mechanism and ii) choice of the most appropriate method to correct the data. The main objective of this article is to illustrate the identification of non-response items as described in the SF-36 questionnaire items in the SU.VI.MAX study. METHODS: A logistic regression on the characteristics of the subjects was used to distinguish between two missing data mechanisms: missing completely at random (MCAR) and missing at random (MAR). Two global Chi-2 tests on MCAR mechanism were proposed. The missing data not at random (MNAR) mechanism was also analysed considering the questionnaire features. RESULTS: The percentage of non-responses was small (1.7%), with a maximum equal to 3% for four questions of the General Health dimension (GH2 to GH5). Both global Chi-2 tests rejected the hypothesis that all SF-36 non-responses were MCAR. As to the 32 items with less than 2.3% of non-responses, the mechanisms were: MCAR for 29 items, MAR for 2 items, and probably MNAR for 1 item. The logistic regression indicates that the factors related to non-responses were gender (female), age (> or =50 years), attention problem, and number of children (> or =3). The hierarchical feature of item PF5 (climb one flight of stairs) in relation to PF4 (climb several flights of stairs) would be a generator MNAR non-responses. The "I don't know" response modality of bloc GH2 to GH5 would also be generator of non-responses of the MNAR type. CONCLUSION: The identification of missing data mechanisms through statistical analysis and through further reflection on the questionnaire's features is a necessary preliminary in the treatment of non-responses.  相似文献   

15.
Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight-gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.  相似文献   

16.

Background  

Missing data is classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying the most appropriate analysis. The first aim was to compare different methods for identifying this missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate whether the reminder-response data can be utilised to help identify the missing data mechanism.  相似文献   

17.
Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight–gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.  相似文献   

18.
Loss to follow-up is problematic in most cohort studies and often leads to bias. Although guidelines suggest acceptable follow-up rates, the authors are unaware of studies that test the validity of these recommendations. The objective of this study was to determine whether the recommended follow-up thresholds of 60-80% are associated with biased effects in cohort studies. A simulation study was conducted using 1000 computer replications of a cohort of 500 observations. The logistic regression model included a binary exposure and three confounders. Varied correlation structures of the data represented various levels of confounding. Differing levels of loss to follow-up were generated through three mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). The authors found no important bias with levels of loss that varied from 5 to 60% when loss to follow-up was related to MCAR or MAR mechanisms. However, when observations were lost to follow-up based on a MNAR mechanism, the authors found seriously biased estimates of the odds ratios with low levels of loss to follow-up. Loss to follow-up in cohort studies rarely occurs randomly. Therefore, when planning a cohort study, one should assume that loss to follow-up is MNAR and attempt to achieve the maximum follow-up rate possible.  相似文献   

19.
OBJECTIVE: Properly handling missing data is a challenge, especially when working with older populations that have high levels of morbidity and mortality. We illustrate methods for understanding whether missing values are ignorable and describe implications of their use in regression modeling. STUDY DESIGN AND SETTING: The use of missingness screens such as Little's missing completely at random "MCAR test" (1988) and the "Index of Sensitivity to Nonignorability (ISNI)" by Troxel and colleagues (2004)introduces complications for regression modeling, and, particularly, for risk factor selection. In a case study of older patients with simulated missing values for a delirium outcome set in a 14-bed medical intensive care unit, we outline a model fitting process that incorporates the use of missingness screens, controls for collinearity, and selects variables based on model fit. RESULTS: The proposed model fitting process identifies more actual risk factors for ICU delirium than does a complete case analysis. CONCLUSION: Use of imputation and other methods for handling missing data assist in the identification of risk factors. They do so accurately only when correct assumptions are made about the nature of missing data. Missingness screens enable researchers to investigate these assumptions.  相似文献   

20.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号