首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One approach is to divide data into time blocks and implement MI independently at each block. An alternative approach is to include all time blocks in the same MI model. With increasing numbers of time blocks, this approach is likely to break down because of co‐linearity and over‐fitting. The new two‐fold fully conditional specification (FCS) MI algorithm addresses these issues, by only conditioning on measurements, which are local in time. We describe and report the results of a novel simulation study to critically evaluate the two‐fold FCS algorithm and its suitability for imputation of longitudinal electronic health records. After generating a full data set, approximately 70% of selected continuous and categorical variables were made missing completely at random in each of ten time blocks. Subsequently, we applied a simple time‐to‐event model. We compared efficiency of estimated coefficients from a complete records analysis, MI of data in the baseline time block and the two‐fold FCS algorithm. The results show that the two‐fold FCS algorithm maximises the use of data available, with the gain relative to baseline MI depending on the strength of correlations within and between variables. Using this approach also increases plausibility of the missing at random assumption by using repeated measures over time of variables whose baseline values may be missing. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

2.
We consider the analysis of longitudinal data sets that include times of recurrent events, where interest lies in variables that are functions of the number of events and the time intervals between events for each individual, and where some cases have gaps when the information was not recorded. Discarding cases with gaps results in a loss of the recorded information in those cases. Other strategies such as simply splicing together the intervals before and after the gap potentially lead to bias. A relatively simple imputation approach is developed that bases the number and times of events within the gap on matches to completely recorded histories. Multiple imputation is used to propagate imputation uncertainty. The procedure is developed here for menstrual calendar data, where the recurrent events are menstrual bleeds recorded longitudinally over time. The recording is somewhat onerous, leading to gaps in the calendar data. The procedure is applied to two important data sets for assessing the menopausal transition, the Melbourne Women's Midlife Health Project and the TREMIN data. A simulation study is presented to assess the statistical properties of the proposed procedure. Some possible extensions of the approach are also considered.  相似文献   

3.
Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because of the risk of disclosure, ages of very old respondents can often not be released; in particular, this is a specific stipulation of the Health Insurance Portability and Accountability Act (HIPAA) for the release of health data for individuals. Top‐coding of individuals beyond a certain age is a standard way of dealing with this issue, and it may be adequate for cross‐sectional data, when a modest number of cases are affected. However, this approach leads to serious loss of information in longitudinal studies when individuals have been followed for many years. We propose and evaluate an alternative to top‐coding for this situation based on multiple imputation (MI). This MI method is applied to a survival analysis of simulated data, and data from the Charleston Heart Study (CHS), and is shown to work well in preserving the relationship between hazard and covariates. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model.  相似文献   

5.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

6.
We address the problem of joint analysis of more than one series of longitudinal measurements. The typical way of approaching this problem is as a joint mixed effects model for the two outcomes. Apart from the large number of parameters needed to specify such a model, perhaps the biggest drawback of this approach is the difficulty in interpreting the results of the model, particularly when the main interest is in the relation between the two longitudinal outcomes. Here we propose an alternative approach to this problem. We use a latent class joint model for the longitudinal outcomes in order to reduce the dimensionality of the problem. We then use a two-stage estimation procedure to estimate the parameters in this model. In the first stage, the latent classes, their probabilities and the mean and covariance structure are estimated based on the longitudinal data of the first outcome. In the second stage, we study the relation between the latent classes and patient characteristics and the other outcome(s). We apply the method to data from 195 consecutive lung cancer patients in two outpatient clinics of lung diseases in The Hague, and we study the relation between denial and longitudinal health measures. Our approach clearly revealed an interesting phenomenon: although no difference between classes could be detected for objective measures of health, patients in classes representing higher levels of denial consistently scored significantly higher in subjective measures of health. Copyright (c) 2008 John Wiley & Sons, Ltd.  相似文献   

7.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

8.
Probabilistic record linkage techniques assign match weights to one or more potential matches for those individual records that cannot be assigned ‘unequivocal matches’ across data files. Existing methods select the single record having the maximum weight provided that this weight is higher than an assigned threshold. We argue that this procedure, which ignores all information from matches with lower weights and for some individuals assigns no match, is inefficient and may also lead to biases in subsequent analysis of the linked data. We propose that a multiple imputation framework be utilised for data that belong to records that cannot be matched unequivocally. In this way, the information from all potential matches is transferred through to the analysis stage. This procedure allows for the propagation of matching uncertainty through a full modelling process that preserves the data structure. For purposes of statistical modelling, results from a simulation example suggest that a full probabilistic record linkage is unnecessary and that standard multiple imputation will provide unbiased and efficient parameter estimates. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log‐transformed numerator and denominator, and then calculate the ratio of interest; we call this ‘passive’ imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this ‘active’ imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high‐density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log‐transformation is preferable. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

10.
Existing joint models for longitudinal and survival data are not applicable for longitudinal ordinal outcomes with possible non‐ignorable missing values caused by multiple reasons. We propose a joint model for longitudinal ordinal measurements and competing risks failure time data, in which a partial proportional odds model for the longitudinal ordinal outcome is linked to the event times by latent random variables. At the survival endpoint, our model adopts the competing risks framework to model multiple failure types at the same time. The partial proportional odds model, as an extension of the popular proportional odds model for ordinal outcomes, is more flexible and at the same time provides a tool to test the proportional odds assumption. We use a likelihood approach and derive an EM algorithm to obtain the maximum likelihood estimates of the parameters. We further show that all the parameters at the survival endpoint are identifiable from the data. Our joint model enables one to make inference for both the longitudinal ordinal outcome and the failure times simultaneously. In addition, the inference at the longitudinal endpoint is adjusted for possible non‐ignorable missing data caused by the failure times. We apply the method to the NINDS rt‐PA stroke trial. Our study considers the modified Rankin Scale only. Other ordinal outcomes in the trial, such as the Barthel and Glasgow scales, can be treated in the same way. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
Missing data due to loss to follow-up or intercurrent events are unintended, but unfortunately inevitable in clinical trials. Since the true values of missing data are never known, it is necessary to assess the impact of untestable and unavoidable assumptions about any unobserved data in sensitivity analysis. This tutorial provides an overview of controlled multiple imputation (MI) techniques and a practical guide to their use for sensitivity analysis of trials with missing continuous outcome data. These include δ- and reference-based MI procedures. In δ-based imputation, an offset term, δ, is typically added to the expected value of the missing data to assess the impact of unobserved participants having a worse or better response than those observed. Reference-based imputation draws imputed values with some reference to observed data in other groups of the trial, typically in other treatment arms. We illustrate the accessibility of these methods using data from a pediatric eczema trial and a chronic headache trial and provide Stata code to facilitate adoption. We discuss issues surrounding the choice of δ in δ-based sensitivity analysis. We also review the debate on variance estimation within reference-based analysis and justify the use of Rubin's variance estimator in this setting, since as we further elaborate on within, it provides information anchored inference.  相似文献   

12.
Ruan PK  Gray RJ 《Statistics in medicine》2008,27(27):5709-5724
We describe a non-parametric multiple imputation method that recovers the missing potential censoring information from competing risks failure times for the analysis of cumulative incidence functions. The method can be applied in the settings of stratified analyses, time-varying covariates, weighted analysis of case-cohort samples and clustered survival data analysis, where no current available methods can be readily implemented. The method uses a Kaplan-Meier imputation method for the censoring times to form an imputed data set, so cumulative incidence can be analyzed using techniques and software developed for ordinary right censored survival data. We discuss the methodology and show from both simulations and real data examples that the method yields valid estimates and performs well. The method can be easily implemented via available software with a minor programming requirement (for the imputation step). It provides a practical, alternative analysis tool for otherwise complicated analyses of cumulative incidence of competing risks data.  相似文献   

13.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

14.
Missing data are a common issue in cost‐effectiveness analysis (CEA) alongside randomised trials and are often addressed assuming the data are ‘missing at random’. However, this assumption is often questionable, and sensitivity analyses are required to assess the implications of departures from missing at random. Reference‐based multiple imputation provides an attractive approach for conducting such sensitivity analyses, because missing data assumptions are framed in an intuitive way by making reference to other trial arms. For example, a plausible not at random mechanism in a placebo‐controlled trial would be to assume that participants in the experimental arm who dropped out stop taking their treatment and have similar outcomes to those in the placebo arm. Drawing on the increasing use of this approach in other areas, this paper aims to extend and illustrate the reference‐based multiple imputation approach in CEA. It introduces the principles of reference‐based imputation and proposes an extension to the CEA context. The method is illustrated in the CEA of the CoBalT trial evaluating cognitive behavioural therapy for treatment‐resistant depression. Stata code is provided. We find that reference‐based multiple imputation provides a relevant and accessible framework for assessing the robustness of CEA conclusions to different missing data assumptions.  相似文献   

15.
In designed longitudinal studies, information from the same set of subjects are collected repeatedly over time. The longitudinal measurements are often subject to missing data which impose an analytic challenge. We propose a functional multiple imputation approach modeling longitudinal response profiles as smooth curves of time under a functional mixed effects model. We develop a Gibbs sampling algorithm to draw model parameters and imputations for missing values, using a blocking technique for an increased computational efficiency. In an illustrative example, we apply a multiple imputation analysis to data from the Panel Study of Income Dynamics and the Child Development Supplement to investigate the gradient effect of family income on children's health status. Our simulation study demonstrates that this approach performs well under varying modeling assumptions on the time trajectory functions and missingness patterns.  相似文献   

16.
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single‐level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost‐effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing‐at‐random clustered data scenarios were simulated following a full‐factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed‐effects multiple imputation and too low following single‐level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

17.
18.
Conventional pattern-mixture models can be highly sensitive to model misspecification. In many longitudinal studies, where the nature of the drop-out and the form of the population model are unknown, interval estimates from any single pattern-mixture model may suffer from undercoverage, because uncertainty about model misspecification is not taken into account. In this article, a new class of Bayesian random coefficient pattern-mixture models is developed to address potentially non-ignorable drop-out. Instead of imposing hard equality constraints to overcome inherent inestimability problems in pattern-mixture models, we propose to smooth the polynomial coefficient estimates across patterns using a hierarchical Bayesian model that allows random variation across groups. Using real and simulated data, we show that multiple imputation under a three-level linear mixed-effects model which accommodates a random level due to drop-out groups can be an effective method to deal with non-ignorable drop-out by allowing model uncertainty to be incorporated into the imputation process.  相似文献   

19.
We propose a semiparametric joint model for bivariate longitudinal ordinal outcomes and competing risks failure time data. The association between the longitudinal and survival endpoints is captured by latent random effects. This approach generalizes previous joint analysis that considers only one response variable at the longitudinal endpoint. One unique feature of the proposed model is that we relax the commonly used normality assumption for random effects and leave the distribution completely unspecified. We use a modified version of the vertex exchange method in conjunction with an expectation-maximization algorithm to estimate the random effects distribution and model parameters. We show via simulations that robust parameter estimates are obtained from the proposed method under various scenarios. We illustrate the approach using cough severity and frequency data from a scleroderma lung study.  相似文献   

20.
Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald‐type statistics and weighted likelihood‐ratio tests are proposed and evaluated in simulation studies. The Wald‐based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号