首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Missing data are a common problem in clinical and epidemiological research, especially in longitudinal studies. Despite many methodological advances in recent decades, many papers on clinical trials and epidemiological studies do not report using principled statistical methods to accommodate missing data or use ineffective or inappropriate techniques. Two refined techniques are presented here: generalized estimating equations (GEEs) and weighted generalized estimating equations (WGEEs). These techniques are an extension of generalized linear models to longitudinal or clustered data, where observations are no longer independent. They can appropriately handle missing data when the missingness is completely at random (GEE and WGEE) or at random (WGEE) and do not require the outcome to be normally distributed. Our aim is to describe and illustrate with a real example, in a simple and accessible way to researchers, these techniques for handling missing data in the context of longitudinal studies subject to dropout and show how to implement them in R. We apply them to assess the evolution of health‐related quality of life in coronary patients in a data set subject to dropout. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

2.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

3.
Existing methods for power analysis for longitudinal study designs are limited in that they do not adequately address random missing data patterns. Although the pattern of missing data can be assessed during data analysis, it is unknown during the design phase of a study. The random nature of the missing data pattern adds another layer of complexity in addressing missing data for power analysis. In this paper, we model the occurrence of missing data with a two-state, first-order Markov process and integrate the modelling information into the power function to account for random missing data patterns. The Markov model is easily specified to accommodate different anticipated missing data processes. We develop this approach for the two most popular longitudinal models: the generalized estimating equations (GEE) and the linear mixed-effects model under the missing completely at random (MCAR) assumption. For GEE, we also limit our consideration to the working independence correlation model. The proposed methodology is illustrated with numerous examples that are motivated by real study designs.  相似文献   

4.
We propose a marginal modeling approach to estimate the association between a time-dependent covariate and an outcome in longitudinal studies where some study participants die during follow-up and both variables have non-monotone response patterns. The proposed method is an extension of weighted estimating equations that allows the outcome and covariate to have different missing-data patterns. We present methods for both random and non-random missing-data mechanisms. A study of functional recovery in a cohort of elderly female hip-fracture patients motivates the approach.  相似文献   

5.
The generalized estimating equations (GEE) approach is commonly used to model incomplete longitudinal binary data. When drop-outs are missing at random through dependence on observed responses (MAR), GEE may give biased parameter estimates in the model for the marginal means. A weighted estimating equations approach gives consistent estimation under MAR when the drop-out mechanism is correctly specified. In this approach, observations or person-visits are weighted inversely proportional to their probability of being observed. Using a simulation study, we compare the performance of unweighted and weighted GEE in models for time-specific means of a repeated binary response with MAR drop-outs. Weighted GEE resulted in smaller finite sample bias than GEE. However, when the drop-out model was misspecified, weighted GEE sometimes performed worse than GEE. Weighted GEE with observation-level weights gave more efficient estimates than a weighted GEE procedure with cluster-level weights.  相似文献   

6.
A number of methods for analysing longitudinal ordinal categorical data with missing-at-random drop-outs are considered. Two are maximum-likelihood methods (MAXLIK) which employ marginal global odds ratios to model associations. The remainder use weighted or unweighted generalized estimating equations (GEE). Two of the GEE use Cholesky-decomposed standardized residuals to model the association structure, while another three extend methods developed for longitudinal binary data in which the association structures are modelled using either Gaussian estimation, multivariate normal estimating equations or conditional residuals. Simulated data sets were used to discover differences among the methods in terms of biases, variances and convergence rates when the association structure is misspecified. The methods were also applied to a real medical data set. Two of the GEE methods, referred to as Cond and ML-norm in this paper and by their originators, were found to have relatively good convergence rates and mean squared errors for all sample sizes (80, 120, 300) considered, and one more, referred to as MGEE in this paper and by its originators, worked fairly well for all but the smallest sample size, 80.  相似文献   

7.
The analysis of data from longitudinal studies requires special techniques, which take into account the fact that the repeated measurements within one individual are correlated. In this paper, the two most commonly used techniques to analyze longitudinal data are compared: generalized estimating equations (GEE) and random coefficient analysis. Both techniques were used to analyze a longitudinal dataset with six measurements on 147 subjects. The purpose of the example was to analyze the relationship between serum cholesterol and four predictor variables, i.e., physical fitness at baseline, body fatness (measured by sum of the thickness of four skinfolds), smoking and gender. The results showed that for a continuous outcome variable, GEE and random coefficient analysis gave comparable results, i.e., GEE-analysis with an exchangeable correlation structure and random coefficient analysis with only a random intercept were the same. There was also no difference between both techniques in the analysis of a dataset with missing data, even when the missing data was highly selective on earlier observed data. For a dichotomous outcome variable, the magnitude of the regression coefficients and standard errors was higher when calculated with random coefficient analysis then when calculated with GEE-analysis. Analysis of a dataset with missing data with a dichotomous outcome variable showed unpredictable results for both GEE and random coefficient analysis. It can be concluded that for a continuous outcome variable, GEE and random coefficient analysis are comparable. Longitudinal data-analysis with dichotomous outcome variables should, however, be interpreted with caution, especially when there are missing data.  相似文献   

8.
The generalized estimating equation (GEE), a distribution‐free, or semi‐parametric, approach for modeling longitudinal data, is used in a wide range of behavioral, psychotherapy, pharmaceutical drug safety, and healthcare‐related research studies. Most popular methods for assessing model fit are based on the likelihood function for parametric models, rendering them inappropriate for distribution‐free GEE. One rare exception is a score statistic initially proposed by Tsiatis for logistic regression (1980) and later extended by Barnhart and Willamson to GEE (1998). Because GEE only provides valid inference under the missing completely at random assumption and missing values arising in most longitudinal studies do not follow such a restricted mechanism, this GEE‐based score test has very limited applications in practice. We propose extensions of this goodness‐of‐fit test to address missing data under the missing at random assumption, a more realistic model that applies to most studies in practice. We examine the performance of the proposed tests using simulated data and demonstrate the utilities of such tests with data from a real study on geriatric depression and associated medical comorbidities. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
Linear regression is one of the most popular statistical techniques. In linear regression analysis, missing covariate data occur often. A recent approach to analyse such data is a weighted estimating equation. With weighted estimating equations, the contribution to the estimating equation from a complete observation is weighted by the inverse 'probability of being observed'. In this paper, we propose a weighted estimating equation in which we wrongly assume that the missing covariates are multivariate normal, but still produces consistent estimates as long as the probability of being observed is correctly modelled. In simulations, these weighted estimating equations appear to be highly efficient when compared to the most efficient weighted estimating equation as proposed by Robins et al. and Lipsitz et al. However, these weighted estimating equations, in which we wrongly assume that the missing covariates are multivariate normal, are much less computationally intensive than the weighted estimating equations given by Lipsitz et al. We compare the weighted estimating equations proposed in this paper to the efficient weighted estimating equations via an example and a simulation study. We only consider missing data which are missing at random; non-ignorably missing data are not addressed in this paper.  相似文献   

11.
目的 应用广义估计方程和准最小二乘方法分析社区卫生服务中心纵向数据,探讨纵向数据分析的问题,为社区的随访的纵向数据的分析提供科学的方法. 方法 对收集的社区卫生服务中心的糖尿病病人血糖的纵向数据,分别使用广义估计方程和准最小二乘方法以及传统的线性回归模型进行分析并比较结果.同时比较三种方法的标准化残差图. 结果 广义估计方程不收敛时与传统线性模型的结果相同,显示糖尿病人血糖与教育水平相关,而广义估计方程收敛时与准最小二乘的结果相同,显示教育无统计学意义.从标准化残差图看广义估计方程和准最小二乘法对数据的拟合比传统回归好. 结论 广义估计方程和准最小二乘法都能有效的处理纵向数据.与广义估计方程相比,准最小二乘法有一些优势.  相似文献   

12.
Tu XM  Feng C  Kowalski J  Tang W  Wang H  Wan C  Ma Y 《Statistics in medicine》2007,26(22):4116-4138
Correlation analysis is widely used in biomedical and psychosocial research for assessing rater reliability, precision of diagnosis and accuracy of proxy outcomes. The popularity of longitudinal study designs has propelled the proliferation in recent years of new methods for longitudinal and other multi-level clustered data designs, such as the mixed-effect models and generalized estimating equations. Despite these advances, research and methodological development on addressing missing data for correlation analysis is woefully lacking. In this paper, we consider non-parametric inference for the product-moment correlation within a longitudinal data setting and address missing data under both the missing completely at random and missing at random assumptions. We illustrate the approach with real study data in mental health and HIV prevention research.  相似文献   

13.
广义估计方程在纵向资料中的应用   总被引:1,自引:1,他引:1  
目的:探讨如何利用纵向资料拟合广义估计方程,为纵向资料分析提供方法学参考。方法:应用实例阐述纵向资料分析的特点和传统分析方法的不足,采用广义估计方程,解决实际存在的问题。结果:用药时间与临床疗效旱正相关(P〈0.0001)。试验组与对照组临床疗效差异有统计学意义(P=0.0413),试验组优于对照组。结论:在纵向资料分析中广义估计方程有一定的优势。  相似文献   

14.
Generalized estimating equations (GEE) are commonly employed for the analysis of correlated data. However, the quadratic inference function (QIF) method is increasing in popularity because of its multiple theoretical advantages over GEE. We base our focus on the fact that the QIF method is more efficient than GEE when the working covariance structure for the data is misspecified. It has been shown that because of the use of an empirical weighting covariance matrix inside its estimating equations, the QIF method's realized estimation performance can potentially be inferior to GEE's when the number of independent clusters is not large. We therefore propose an alternative weighting matrix for the QIF, which asymptotically is an optimally weighted combination of the empirical covariance matrix and its model‐based version, which is derived by minimizing its expected quadratic loss. Use of the proposed weighting matrix maintains the large‐sample advantages the QIF approach has over GEE and, as shown via simulation, improves small‐sample parameter estimation. We also illustrated the proposed method in the analysis of a longitudinal study. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

15.
纵向观测计数数据的对数线性模型   总被引:4,自引:0,他引:4  
纵向观测数据是按时间顺序对个体的某一变量进行多次人观测获得的资料。本文利用广义线性模型对同计数数据进行了分析,充分考虑重复观测间的相关性。方法采用Zeger和Liang提出的广义估计方程在拟对数广义线性模型的同时,引入偏离参数,讨论三种协方差矩阵的结构。结果同时获得的回归参数,相关参数,偏离参数的估计,完成了较为实用的运行程序,并进行了实例讨论。结论医学研究和临床试验中经接触到纵向观测数据,对这类  相似文献   

16.
Aims Missing health-related quality of life (HRQOL) data in clinical trials can impact conclusions but the effect has not been thoroughly studied in HIV clinical trials. Despite repeated recommendations to avoid complete case (CC) analysis and last observation carried forward (LOCF), these approaches are commonly used to handle missing data. The goal of this investigation is to describe the use of different analytic methods under assumptions of missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) using HIV as an empirical example. Methods Medical Outcomes Study HIV (MOS-HIV) Health Survey data were combined from two large open-label multinational HIV clinical trials comparing treatments A and B over 48 weeks. Inclusion in the HRQOL analysis required completion of the MOS-HIV at baseline and at least one follow-up visit (weeks 8, 16, 24, 40, 48). Primary outcomes for the analysis were change from week 0 to 48 in mental health summary (MHS), physical health summary (PHS), pain and health distress scores analyzed using CC, LOCF, generalized estimating equations (GEE), direct likelihood and sensitivity analyses using joint mixed-effects model, and Markov chain Monte Carlo (MCMC) multiple imputation. Time and treatment were included in all models. Baseline and longitudinal variables (adverse event and reason for discontinuation) were only used in the imputation model. Results A total of 511 patients randomized to treatment A and 473 to treatment B completed the MOS-HIV at baseline and at least one follow-up visit. At week 48, 71% of patients on treatment A and 31% on treatment B completed the MOS-HIV survey. Examining changes within each treatment group, CC and MCMC generally produced the largest or most positive changes. The joint model was most conservative; direct likelihood and GEE produced intermediate results; LOCF showed no consistent trend. There was greater spread for within-group changes than between-group differences (within MHS scores for treatment A: −0.1 to 1.6, treatment B: 0.4 to 2.0; between groups: −0.7 to 0.4; within PHS scores for treatment A: −1.5 to 0.4, treatment B: −1.7 to −0.2; between groups: 0.1 to 1.1). The size of within-group changes and between-group differences was of similar magnitude for the pain and health distress scores. In all cases, the range of estimates was small <0.2 SD (less than 2 points for the summary scores and 5 points for the subscale scores). Conclusions Use of the recommended likelihood-based models that do not require assumptions of MCAR was very feasible. Sensitivity analyses using auxiliary information can help to investigate the potential effect that missing data have on results but require planning to ensure that relevant data are prospectively collected.  相似文献   

17.
Analysis of health care cost data is often complicated by a high level of skewness, heteroscedastic variances and the presence of missing data. Most of the existing literature on cost data analysis have been focused on modeling the conditional mean. In this paper, we study a weighted quantile regression approach for estimating the conditional quantiles health care cost data with missing covariates. The weighted quantile regression estimator is consistent, unlike the naive estimator, and asymptotically normal. Furthermore, we propose a modified BIC for variable selection in quantile regression when the covariates are missing at random. The quantile regression framework allows us to obtain a more complete picture of the effects of the covariates on the health care cost and is naturally adapted to the skewness and heterogeneity of the cost data. The method is semiparametric in the sense that it does not require to specify the likelihood function for the random error or the covariates. We investigate the weighted quantile regression procedure and the modified BIC via extensive simulations. We illustrate the application by analyzing a real data set from a health care cost study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
Liang and Zeger proposed a generalized estimating equations approach to the analysis of longitudinal data. Their models assume that missing observations are missing completely at random in the sense of Rubin. However, when this assumption does not hold, their analysis may yield biased results. In this paper, we develop a simple and practical procedure for testing this assumption. The proposed procedure is related to that of Park and Davis. © 1997 John Wiley & Sons, Ltd.  相似文献   

19.
We extend the marginalized transition model of Heagerty to accommodate non-ignorable monotone drop-out. Using a selection model, weakly identified drop-out parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40 per cent compared to a likelihood-based marginalized transition model (MTM) with comparable modelling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and non-ignorable missing data, and both reduce bias noticeably by improving model fit.  相似文献   

20.
The method of generalized estimating equations (GEE) models the association between the repeated observations on a subject with a patterned correlation matrix. Correct specification of the underlying structure is a potentially beneficial goal, in terms of improving efficiency and enhancing scientific understanding. We consider two sets of criteria that have previously been suggested, respectively, for selecting an appropriate working correlation structure, and for ruling out a particular structure(s), in the GEE analysis of longitudinal studies with binary outcomes. The first selection criterion chooses the structure for which the model‐based and the sandwich‐based estimator of the covariance matrix of the regression parameter estimator are closest, while the second selection criterion chooses the structure that minimizes the weighted error sum of squares. The rule out criterion deselects structures for which the estimated correlation parameter violates standard constraints for binary data that depend on the marginal means. In addition, we remove structures from consideration if their estimated parameter values yield an estimated correlation structure that is not positive definite. We investigate the performance of the two sets of criteria using both simulated and real data, in the context of a longitudinal trial that compares two treatments for major depressive episode. Practical recommendations are also given on using these criteria to aid in the efficient selection of a working correlation structure in GEE analysis of longitudinal binary data. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号