首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ObjectiveTo assess the added value of multiple imputation (MI) of missing repeated outcomes measures in longitudinal data sets analyzed with linear mixed-effects (LME) models.Study Design and SettingData were used from a trial on the effects of Rosuvastatin on rate of change in carotid intima-media thickness (CIMT). The reference treatment effect was derived from a complete data set. Scenarios and proportions of missing values in CIMT measurements were applied and LME analyses were used before and after MI. The added value of MI, in terms of bias and precision, was assessed using the mean-squared error (MSE) of the treatment effects and coverage of the 95% confidence interval.ResultsThe reference treatment effect was ?0.0177 mm/y. The MSEs for LME analysis without and with MI were similar in scenarios with up to 40% missing values. Coverage was large in all scenarios and was similar for LME with and without MI.ConclusionOur study empirically shows that MI of missing end point data before LME analyses does not increase precision in the estimated rate of change in the end point. Hence, MI had no added value in this setting and standard LME modeling remains the method of choice.  相似文献   

2.
The usual methods for analyzing case-cohort studies rely on sometimes not fully efficient weighted estimators. Multiple imputation might be a good alternative because it uses all the data available and approximates the maximum partial likelihood estimator. This method is based on the generation of several plausible complete data sets, taking into account uncertainty about missing values. When the imputation model is correctly defined, the multiple imputation estimator is asymptotically unbiased and its variance is correctly estimated. We show that a correct imputation model must be estimated from the fully observed data (cases and controls), using the case status among the explanatory variable. To validate the approach, we analyzed case-cohort studies first with completely simulated data and then with case-cohort data sampled from two real cohorts. The analyses of simulated data showed that, when the imputation model was correct, the multiple imputation estimator was unbiased and efficient. The observed gain in precision ranged from 8 to 37 per cent for phase-1 variables and from 5 to 19 per cent for the phase-2 variable. When the imputation model was misspecified, the multiple imputation estimator was still more efficient than the weighted estimators but it was also slightly biased. The analyses of case-cohort data sampled from complete cohorts showed that even when no strong predictor of the phase-2 variable was available, the multiple imputation was unbiased, as precised as the weighted estimator for the phase-2 variable and slightly more precise than the weighted estimators for the phase-1 variables. However, the multiple imputation estimator was found to be biased when, because of interaction terms, some coefficients of the imputation model had to be estimated from small samples. Multiple imputation is an efficient technique for analyzing case-cohort data. Practically, we suggest building the analysis model using only the case-cohort data and weighted estimators. Multiple imputation can eventually be used to reanalyze the data using the selected model in order to improve the precision of the results.  相似文献   

3.
The Myocardial Ischaemia National Audit Project (MINAP) is a register of heart attacks covering 234 acute admitting hospitals in England and Wales. It is used to assess the extent to which hospitals are attaining the government targets for patients with heart attacks (myocardial infarction). MINAP is therefore of national importance in coronary care and of potential international importance for research. As with most observational databases, there is missing data in MINAP, which has the potential to bias statistical analyses. In this paper, we use multiple imputation to reduce the impact of missing data and we give details of how our imputation scheme was implemented. The key contribution of this paper is the provision of multiply completed datasets, suited to a range of analyses, that can be used to make efficient inferences without the distractions of missing data. Our work will assist MINAP in achieving its priority goal of providing useful data with which to analyse patient care.  相似文献   

4.

Background

When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).

Methods

Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1–0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.

Results

Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.

Conclusions

In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.
  相似文献   

5.
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log‐transformed numerator and denominator, and then calculate the ratio of interest; we call this ‘passive’ imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this ‘active’ imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high‐density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log‐transformation is preferable. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

6.
The problem of missing data is frequently encountered in observational studies. We compared approaches to dealing with missing data. Three multiple imputation methods were compared with a method of enhancing a clinical database through merging with administrative data. The clinical database used for comparison contained information collected from 6,065 cardiac care patients in 1995 in the province of Alberta, Canada. The effectiveness of the different strategies was evaluated using measures of discrimination and goodness of fit for the 1995 data. The strategies were further evaluated by examining how well the models predicted outcomes in data collected from patients in 1996. In general, the different methods produced similar results, with one of the multiple imputation methods demonstrating a slight advantage. It is concluded that the choice of missing data strategy should be guided by statistical expertise and data resources.  相似文献   

7.
Multiple imputation of missing blood pressure covariates in survival analysis   总被引:24,自引:0,他引:24  
This paper studies a non-response problem in survival analysis where the occurrence of missing data in the risk factor is related to mortality. In a study to determine the influence of blood pressure on survival in the very old (85+ years), blood pressure measurements are missing in about 12.5 per cent of the sample. The available data suggest that the process that created the missing data depends jointly on survival and the unknown blood pressure, thereby distorting the relation of interest. Multiple imputation is used to impute missing blood pressure and then analyse the data under a variety of non-response models. One special modelling problem is treated in detail; the construction of a predictive model for drawing imputations if the number of variables is large. Risk estimates for these data appear robust to even large departures from the simplest non-response model, and are similar to those derived under deletion of the incomplete records.  相似文献   

8.
Multiple imputation of baseline data in the cardiovascular health study   总被引:3,自引:0,他引:3  
Most epidemiologic studies will encounter missing covariate data. Software packages typically used for analyzing data delete any cases with a missing covariate to perform a complete case analysis. The deletion of cases complicates variable selection when different variables are missing on different cases, reduces power, and creates the potential for bias in the resulting estimates. Recently, software has become available for producing multiple imputations of missing data that account for the between-imputation variability. The implementation of the software to impute missing baseline data in the setting of the Cardiovascular Health Study, a large, observational study, is described. Results of exploratory analyses using the imputed data were largely consistent with results using only complete cases, even in a situation where one third of the cases were excluded from the complete case analysis. There were few differences in the exploratory results across three imputations, and the combined results from the multiple imputations were very similar to results from a single imputation. An increase in power was evident and variable selection simplified when using the imputed data sets.  相似文献   

9.
Multiple imputation: review of theory, implementation and software   总被引:3,自引:0,他引:3  
Harel O  Zhou XH 《Statistics in medicine》2007,26(16):3057-3077
Missing data is a common complication in data analysis. In many medical settings missing data can cause difficulties in estimation, precision and inference. Multiple imputation (MI) (Multiple Imputation for Nonresponse in Surveys. Wiley: New York, 1987) is a simulation-based approach to deal with incomplete data. Although there are many different methods to deal with incomplete data, MI has become one of the leading methods. Since the late 1980s we observed a constant increase in the use and publication of MI-related research. This tutorial does not attempt to cover all the material concerning MI, but rather provides an overview and combines together the theory behind MI, the implementation of MI, and discusses increasing possibilities of the use of MI using commercial and free software. We illustrate some of the major points using an example from an Alzheimer disease (AD) study. In this AD study, while clinical data are available for all subjects, postmortem data are only available for the subset of those who died and underwent an autopsy. Analysis of incomplete data requires making unverifiable assumptions. These assumptions are discussed in detail in the text. Relevant S-Plus code is provided.  相似文献   

10.
Background and ObjectivesAs a result of the development of sophisticated techniques, such as multiple imputation, the interest in handling missing data in longitudinal studies has increased enormously in past years. Within the field of longitudinal data analysis, there is a current debate on whether it is necessary to use multiple imputations before performing a mixed-model analysis to analyze the longitudinal data. In the current study this necessity is evaluated.Study Design and SettingThe results of mixed-model analyses with and without multiple imputation were compared with each other. Four data sets with missing values were created—one data set with missing completely at random, two data sets with missing at random, and one data set with missing not at random). In all data sets, the relationship between a continuous outcome variable and two different covariates were analyzed: a time-independent dichotomous covariate and a time-dependent continuous covariate.ResultsAlthough for all types of missing data, the results of the mixed-model analysis with or without multiple imputations were slightly different, they were not in favor of one of the two approaches. In addition, repeating the multiple imputations 100 times showed that the results of the mixed-model analysis with multiple imputation were quite unstable.ConclusionIt is not necessary to handle missing data using multiple imputations before performing a mixed-model analysis on longitudinal data.  相似文献   

11.
12.
BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.  相似文献   

13.
In this article, we will present statistical methods to assess to what extent the effect of a randomised treatment (versus control) on a time-to-event endpoint might be explained by the effect of treatment on a mediator of interest, a variable that is measured longitudinally at planned visits throughout the trial. In particular, we will show how to identify and infer the path-specific effect of treatment on the event time via the repeatedly measured mediator levels. The considered proposal addresses complications due to patients dying before the mediator is assessed, due to the mediator being repeatedly measured, and due to posttreatment confounding of the effect of the mediator by other mediators. We illustrate the method by an application to data from the LEADER cardiovascular outcomes trial.  相似文献   

14.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
BACKGROUND: Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. METHODS: We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. RESULTS: Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. CONCLUSIONS: Epidemiologists should consider using multiple imputation more often than is current practice.  相似文献   

16.
We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects ‘cured’, and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification. We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
Summary   This research project investigated the relationship between chewing gum and short-term learning, as prior studies had reported conflicting results. Incoming first-year dental students were assigned by stratified randomisation to either a group who chewed gum during lectures and examinations or a group that did not chew gum. The research subjects listened to a taped lecture on dental anatomy and then completed two examinations: (1) a test of specific knowledge which was a multiple-choice test on the dental anatomy lecture material; and (2) a test of generalised knowledge which was a standardised reading comprehension exam. Statistical analysis of the results showed that in a group of graduate students with a history of high academic performance, there was no difference in learning between research subjects who chewed gum compared with those who did not chew gum, as measured by performance on either test.  相似文献   

18.
The relation of longitudinally measured blood pressure to cognitive performance in the absence of clinically diagnosed cerebrovascular disease was investigated in the Framingham Study. In 1976-1978, neuropsychologic testing was administered to 1993 participants aged 55-89 years. Performance on an education-adjusted composite of these tests was examined in relation to measures of chronicity of hypertension as well as the average systolic and average diastolic blood pressure. All analyses were stratified by antihypertensive medication use during the 2 years prior to cognitive testing and adjusted for age, sex, occupation, alcohol consumption, and participation rate in prior examination cycles. Among subjects on drug therapy for hypertension, there was no association between cognitive performance and longitudinally measured blood pressure. The proportion of cycles in which hypertension was present and average systolic and diastolic blood pressure had a significant inverse relation with cognitive performance only in the group not on antihypertensive drug therapy. However, among subjects on antihypertensive medication at earlier cycles, there was a highly significant graded relation between cognitive impairment and the probability of being off medication at the time of testing. These results suggest that hypertension-related subclinical vascular disease is not an important cause of cognitive impairment in the elderly. Cognitive impairment may, however, be associated with a reduced adherence to drug treatment regimens.  相似文献   

19.
Estimating velocity and acceleration trajectories allows novel inferences in the field of longitudinal data analysis, such as estimating change regions rather than change points, and testing group effects on nonlinear change in an outcome (ie, a nonlinear interaction). In this article, we develop derivative estimation for 2 standard approaches—polynomial mixed models and spline mixed models. We compare their performance with an established method—principal component analysis through conditional expectation through a simulation study. We then apply the methods to repeated blood pressure (BP) measurements in a UK cohort of pregnant women, where the goals of analysis are to (i) identify and estimate regions of BP change for each individual and (ii) investigate the association between parity and BP change at the population level. The penalized spline mixed model had the lowest bias in our simulation study, and we identified evidence for BP change regions in over 75% of pregnant women. Using mean velocity difference revealed differences in BP change between women in their first pregnancy compared with those who had at least 1 previous pregnancy. We recommend the use of penalized spline mixed models for derivative estimation in longitudinal data analysis.  相似文献   

20.
Harel O  Zhou XH 《Statistics in medicine》2007,26(11):2370-2388
Two-phase designs are common in epidemiological studies of dementia, and especially in Alzheimer research. In the first phase, all subjects are screened using a common screening test(s), while in the second phase, only a subset of these subjects is tested using a more definitive verification assessment, i.e. golden standard test. When comparing the accuracy of two screening tests in a two-phase study of dementia, inferences are commonly made using only the verified sample. It is well documented that in that case, there is a risk for bias, called verification bias. When the two screening tests have only two values (e.g. positive and negative) and we are trying to estimate the differences in sensitivities and specificities of the tests, one is actually estimating a confidence interval for differences of binomial proportions. Estimating this difference is not trivial even with complete data. It is well documented that it is a tricky task. In this paper, we suggest ways to apply imputation procedures in order to correct the verification bias. This procedure allows us to use well-established complete-data methods to deal with the difficulty of the estimation of the difference of two binomial proportions in addition to dealing with incomplete data. We compare different methods of estimation and evaluate the use of multiple imputation in this case. Our simulation results show that the use of multiple imputation is superior to other commonly used methods. We demonstrate our finding using Alzheimer data. Copyright (c) 2006 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号