首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Missing data is a very common problem in medical and social studies, especially when data are collected longitudinally. It is a challenging problem to utilize observed data effectively. Many papers on missing data problems can be found in statistical literature. It is well known that the inverse weighted estimation is neither efficient nor robust. On the other hand, the doubly robust (DR) method can improve the efficiency and robustness. As is known, the DR estimation requires a missing data model (i.e., a model for the probability that data are observed) and a working regression model (i.e., a model for the outcome variable given covariates and surrogate variables). Because the DR estimating function has mean zero for any parameters in the working regression model when the missing data model is correctly specified, in this paper, we derive a formula for the estimator of the parameters of the working regression model that yields the optimally efficient estimator of the marginal mean model (the parameters of interest) when the missing data model is correctly specified. Furthermore, the proposed method also inherits the DR property. Simulation studies demonstrate the greater efficiency of the proposed method compared with the standard DR method. A longitudinal dementia data set is used for illustration. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
In longitudinal studies, missing observations occur commonly. It has been well known that biased results could be produced if missingness is not properly handled in the analysis. Authors have developed many methods with the focus on either incomplete response or missing covariate observations, but rarely on both. The complexity of modeling and computational difficulty would be the major challenges in handling missingness in both response and covariate variables. In this paper, we develop methods using the pairwise likelihood formulation to handle longitudinal binary data with missing observations present in both response and covariate variables. We propose a unified framework to accommodate various types of missing data patterns. We evaluate the performance of the methods empirically under a variety of circumstances. In particular, we investigate issues on efficiency and robustness. We analyze longitudinal data from the National Population Health Study with the use of our methods. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

3.
Studies with longitudinal measurements are common in clinical research. Particular interest lies in studies where the repeated measurements are used to predict a time-to-event outcome, such as mortality, in a dynamic manner. If event rates in a study are low, however, and most information is to be expected from the patients experiencing the study endpoint, it may be more cost efficient to only use a subset of the data. One way of achieving this is by applying a case-cohort design, which selects all cases and only a random samples of the noncases. In the standard way of analyzing data in a case-cohort design, the noncases who were not selected are completely excluded from analysis; however, the overrepresentation of the cases will lead to bias. We propose to include survival information of all patients from the cohort in the analysis. We approach the fact that we do not have longitudinal information for a subset of the patients as a missing data problem and argue that the missingness mechanism is missing at random. Hence, results obtained from an appropriate model, such as a joint model, should remain valid. Simulations indicate that our method performs similar to fitting the model on a full cohort, both in terms of parameters estimates and predictions of survival probabilities. Estimating the model on the classical version of the case-cohort design shows clear bias and worse performance of the predictions. The procedure is further illustrated in data from a biomarker study on acute coronary syndrome patients, BIOMArCS.  相似文献   

4.
Most implementations of multiple imputation (MI) of missing data are designed for simple rectangular data structures ignoring temporal ordering of data. Therefore, when applying MI to longitudinal data with intermittent patterns of missing data, some alternative strategies must be considered. One approach is to divide data into time blocks and implement MI independently at each block. An alternative approach is to include all time blocks in the same MI model. With increasing numbers of time blocks, this approach is likely to break down because of co‐linearity and over‐fitting. The new two‐fold fully conditional specification (FCS) MI algorithm addresses these issues, by only conditioning on measurements, which are local in time. We describe and report the results of a novel simulation study to critically evaluate the two‐fold FCS algorithm and its suitability for imputation of longitudinal electronic health records. After generating a full data set, approximately 70% of selected continuous and categorical variables were made missing completely at random in each of ten time blocks. Subsequently, we applied a simple time‐to‐event model. We compared efficiency of estimated coefficients from a complete records analysis, MI of data in the baseline time block and the two‐fold FCS algorithm. The results show that the two‐fold FCS algorithm maximises the use of data available, with the gain relative to baseline MI depending on the strength of correlations within and between variables. Using this approach also increases plausibility of the missing at random assumption by using repeated measures over time of variables whose baseline values may be missing. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

5.
We propose a transition model for analysing data from complex longitudinal studies. Because missing values are practically unavoidable in large longitudinal studies, we also present a two-stage imputation method for handling general patterns of missing values on both the outcome and the covariates by combining multiple imputation with stochastic regression imputation. Our model is a time-varying auto-regression on the past innovations (residuals), and it can be used in cases where general dynamics must be taken into account, and where the model selection is important. The entire estimation process was carried out using available procedures in statistical packages such as SAS and S-PLUS. To illustrate the viability of the proposed model and the two-stage imputation method, we analyse data collected in an epidemiological study that focused on various factors relating to childhood growth. Finally, we present a simulation study to investigate the behaviour of our two-stage imputation procedure.  相似文献   

6.
The recent biostatistical literature contains a number of methods for handling the bias caused by 'informative censoring', which refers to drop-out from a longitudinal study after a number of visits scheduled at predetermined intervals. The same or related methods can be extended to situations where the missing pattern is intermittent. The pattern of missingness is often assumed to be related to the outcome through random effects which represent unmeasured individual characteristics such as health awareness. To date there is only limited experience with applying the methods for informative censoring in practice, mostly because of complicated modelling and difficult computations. In this paper, we propose an estimation method based on grouping the data. The proposed estimator is asymptotically unbiased in various situations under informative missingness. Several existing methods are reviewed and compared in simulation studies. We apply the methods to data from the Wisconsin Diabetes Registry Project, a longitudinal study tracking glycaemic control and acute and chronic complications from the diagnosis of type I diabetes.  相似文献   

7.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

8.
The Collaborative Ankle Support Trial (CAST) is a longitudinal trial of treatments for severe ankle sprains in which interest lies in the rate of improvement, the effectiveness of reminders and potentially informative missingness. A model is proposed for continuous longitudinal data with non-ignorable or informative missingness, taking into account the nature of attempts made to contact initial non-responders. The model combines a non-linear mixed model for the outcome model with logistic regression models for the reminder processes. A sensitivity analysis is used to contrast this model with the traditional selection model, where we adjust for missingness by modelling the missingness process. The conclusions that recovery is slower, and less satisfactory with age and more rapid with below knee cast than with a tubular bandage do not alter materially across all models investigated. The results also suggest that phone calls are most effective in retrieving questionnaires.  相似文献   

9.
The missing data problem is common in longitudinal or hierarchical structure studies. In this paper, we propose a correlated random‐effects model to fit normal longitudinal or cluster data when the missingness mechanism is nonignorable. Computational challenges arise in the model fitting due to intractable numerical integrations. We obtain the estimates of the parameters based on an accurate approximation of the log likelihood, which has higher‐order accuracy but with less computational burden than the existing approximation. We apply the proposed method it to a real data set arising from an autism study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
Last observation carried forward (LOCF) and analysis using only data from subjects who complete a trial (Completers) are commonly used techniques for analysing data in clinical trials with incomplete data when the endpoint is change from baseline at last scheduled visit. We propose two alternative methods. The semi-parametric method, which cumulates changes observed between consecutive time points, is conceptually similar to the familiar life-table method and corresponding Kaplan-Meier estimation when the primary endpoint is time to event. A non-parametric analogue of LOCF is obtained by carrying forward, not the observed value, but the rank of the change from baseline at the last observation for each subject. We refer to this method as the LRCF method. Both procedures retain the simplicity of LOCF and Completers analyses and, like these methods, do not require data imputation or modelling assumptions. In the absence of any incomplete data they reduce to the usual two-sample tests. In simulations intended to reflect chronic diseases that one might encounter in practice, LOCF was observed to produce markedly biased estimates and markedly inflated type I error rates when censoring was unequal in the two treatment arms. These problems did not arise with the Completers, Cumulative Change, or LRCF methods. Cumulative Change and LRCF were more powerful than Completers, and the Cumulative Change test provided more efficient estimates than the Completers analysis, in all simulations. We conclude that the Cumulative Change and LRCF methods are preferable to LOCF and Completers analyses. Mixed model repeated measures (MMRM) performed similarly to Cumulative Change and LRCF and makes somewhat less restrictive assumptions about missingness mechanisms, so that it is also a reasonable alternative to LOCF and Completers analyses.  相似文献   

11.
Huang Y  Dagne G  Wu L 《Statistics in medicine》2011,30(24):2930-2946
Normality (symmetry) of the model random errors is a routine assumption for mixed-effects models in many longitudinal studies, but it may be unrealistically obscuring important features of subject variations. Covariates are usually introduced in the models to partially explain inter-subject variations, but some covariates such as CD4 cell count may be often measured with substantial errors. This paper formulates a class of models in general forms that considers model errors to have skew-normal distributions for a joint behavior of longitudinal dynamic processes and time-to-event process of interest. For estimating model parameters, we propose a Bayesian approach to jointly model three components (response, covariate, and time-to-event processes) linked through the random effects that characterize the underlying individual-specific longitudinal processes. We discuss in detail special cases of the model class, which are offered to jointly model HIV dynamic response in the presence of CD4 covariate process with measurement errors and time to decrease in CD4/CD8 ratio, to provide a tool to assess antiretroviral treatment and to monitor disease progression. We illustrate the proposed methods using the data from a clinical trial study of HIV treatment. The findings from this research suggest that the joint models with a skew-normal distribution may provide more reliable and robust results if the data exhibit skewness, and particularly the results may be important for HIV/AIDS studies in providing quantitative guidance to better understand the virologic responses to antiretroviral treatment.  相似文献   

12.
Missing outcome data and incomplete uptake of randomised interventions are common problems, which complicate the analysis and interpretation of randomised controlled trials, and are rarely addressed well in practice. To promote the implementation of recent methodological developments, we describe sequences of randomisation-based analyses that can be used to explore both issues. We illustrate these in an Internet-based trial evaluating the use of a new interactive website for those seeking help to reduce their alcohol consumption, in which the primary outcome was available for less than half of the participants and uptake of the intervention was limited. For missing outcome data, we first employ data on intermediate outcomes and intervention use to make a missing at random assumption more plausible, with analyses based on general estimating equations, mixed models and multiple imputation. We then use data on the ease of obtaining outcome data and sensitivity analyses to explore departures from the missing at random assumption. For incomplete uptake of randomised interventions, we estimate structural mean models by using instrumental variable methods. In the alcohol trial, there is no evidence of benefit unless rather extreme assumptions are made about the missing data nor an important benefit in more extensive users of the intervention. These findings considerably aid the interpretation of the trial's results. More generally, the analyses proposed are applicable to many trials with missing outcome data or incomplete intervention uptake. To facilitate use by others, Stata code is provided for all methods.  相似文献   

13.
Background: Joint modeling of longitudinal and time-to-event data is often advantageous over separate longitudinal or time-to-event analyses as it can account for study dropout, error in longitudinally measured covariates, and correlation between longitudinal and time-to-event outcomes. The current literature on joint modeling focuses mainly on the analysis of single studies with a lack of methods available for the meta-analysis of joint data from multiple studies. Methods: We investigate a variety of one-stage methods for the meta-analysis of joint longitudinal and time-to-event outcome data. These methods are applied to the INDANA dataset to investigate longitudinally measured systolic blood pressure, with each of time to death, time to myocardial infarction, and time to stroke. Results are compared to separate longitudinal or time-to-event meta-analyses. A simulation study is conducted to contrast separate versus joint analyses over a range of scenarios. Results: The performance of the examined one-stage joint meta-analytic models varied. Models that accounted for between study heterogeneity performed better than models that ignored it. Of the examined methods to account for between study heterogeneity, under the examined association structure, fixed effect approaches appeared preferable, whereas methods involving a baseline hazard stratified by study were least time intensive. Conclusions: One-stage joint meta-analytic models that accounted for between study heterogeneity using a mix of fixed effects or a stratified baseline hazard were reliable; however, models examined that included study level random effects in the association structure were less reliable.  相似文献   

14.
It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model.  相似文献   

15.
We extend the marginalized transition model of Heagerty to accommodate non-ignorable monotone drop-out. Using a selection model, weakly identified drop-out parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40 per cent compared to a likelihood-based marginalized transition model (MTM) with comparable modelling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and non-ignorable missing data, and both reduce bias noticeably by improving model fit.  相似文献   

16.
We propose to perform a sensitivity analysis to evaluate the extent to which results from a longitudinal study can be affected by informative drop-outs. The method is based on a selection model, where the parameter relating the dropout probability to the current observation is not estimated, but fixed to a set of values. This allows to evaluate several hypotheses for the degree of informativeness of the drop-out process. Expectation and variance of missing data, conditional on the drop-out time are computed, and a stochastic EM algorithm is used to obtain maximum likelihood estimates. Simulations show that when the drop-out parameter is correctly specified, unbiased estimates of the other parameters are obtained, and coverage percentages of their confidence intervals are close to their theoretical value. More interestingly, misspecification of the drop-out parameter does not considerably alter these results. This method was applied to a randomized clinical trial, designed to demonstrate non-inferiority of an inhaled corticosteroid in terms of bone density, compared with a reference treatment. Sensitivity analysis showed that the conclusion of non-inferiority was robust against different hypotheses for the drop-out process.  相似文献   

17.
There is a tremendous current interest in measuring multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation profiles, metabolic profiles, protein expressions) on a large number of subjects. Although genotypes are typically available for all study subjects, other data types may be measured only on a subset of subjects due to cost or other constraints. In addition, quantitative omics measurements, such as metabolite levels and protein expressions, are subject to detection limits in that the measurements below (or above) certain thresholds are not detectable. In this article, we propose a rigorous and powerful approach to handle missing values and detection limits in integrative analysis of multiomics data. We relate quantitative omics variables to genetic variants and other variables through linear regression models and relate phenotypes to quantitative omics variables and other variables through generalized linear models. We derive the joint-likelihood for the two sets of models by allowing arbitrary patterns of missing values and detection limits for quantitative omics variables. We carry out maximum-likelihood estimation through computationally fast and stable algorithms. The resulting estimators are approximately unbiased and statistically efficient. An application to a major study on chronic obstructive lung disease yielded new biological insights.  相似文献   

18.
A number of methods for analysing longitudinal ordinal categorical data with missing-at-random drop-outs are considered. Two are maximum-likelihood methods (MAXLIK) which employ marginal global odds ratios to model associations. The remainder use weighted or unweighted generalized estimating equations (GEE). Two of the GEE use Cholesky-decomposed standardized residuals to model the association structure, while another three extend methods developed for longitudinal binary data in which the association structures are modelled using either Gaussian estimation, multivariate normal estimating equations or conditional residuals. Simulated data sets were used to discover differences among the methods in terms of biases, variances and convergence rates when the association structure is misspecified. The methods were also applied to a real medical data set. Two of the GEE methods, referred to as Cond and ML-norm in this paper and by their originators, were found to have relatively good convergence rates and mean squared errors for all sample sizes (80, 120, 300) considered, and one more, referred to as MGEE in this paper and by its originators, worked fairly well for all but the smallest sample size, 80.  相似文献   

19.
A mixed effect model is proposed to jointly analyze multivariate longitudinal data with continuous, proportion, count, and binary responses. The association of the variables is modeled through the correlation of random effects. We use a quasi‐likelihood type approximation for nonlinear variables and transform the proposed model into a multivariate linear mixed model framework for estimation and inference. Via an extension to the EM approach, an efficient algorithm is developed to fit the model. The method is applied to physical activity data, which uses a wearable accelerometer device to measure daily movement and energy expenditure information. Our approach is also evaluated by a simulation study.  相似文献   

20.
We examine the behaviour of the variance-covariance parameter estimates in an alternating binary Markov model with misclassification. Transition probabilities specify the state transitions for a process that is not directly observable. The state of an observable process, which may not correctly classify the state of the unobservable process, is obtained at discrete time points. Misclassification probabilities capture the two types of classification errors. Variance components of the estimated transition parameters are calculated with three estimation procedures: observed information, jackknife, and bootstrap techniques. Simulation studies are used to compare variance estimates and reveal the effect of misclassification on transition parameter estimation. The three approaches generally provide similar variance estimates for large samples and moderate misclassification. In these situations, the resampling methods are reasonable alternatives when programming partial derivatives is not appealing. With smaller chains or higher misclassification probabilities, the bootstrap method appears to be the best choice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号