期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study.

G Touloumi A G Babiker S J Pocock J H Darbyshire 《Statistics in medicine》2001,20(24):3715-3728

Many cohort studies and clinical trials are designed to compare rates of change over time in one or more disease markers in several groups. One major problem in such longitudinal studies is missing data due to patient drop-out. The bias and efficiency of six different methods to estimate rates of changes in longitudinal studies with incomplete observations were compared: generalized estimating equation estimates (GEE) proposed by Liang and Zeger (1986); unweighted average of ordinary least squares (OLSE) of individual rates of change (UWLS); weighted average of OLSE (WLS); conditional linear model estimates (CLE), a covariate type estimates proposed by Wu and Bailey (1989); random effect (RE), and joint multivariate RE (JMRE) estimates. The latter method combines a linear RE model for the underlying pattern of the marker with a log-normal survival model for informative drop-out process. The performance of these methods in the presence of missing data completely at random (MCAR), at random (MAR) and non-ignorable (NIM) were compared in simulation studies. Data for the disease marker were generated under the linear random effects model with parameter values derived from realistic examples in HIV infection. Rates of drop-out, assumed to increase over time, were allowed to be independent of marker values or to depend either only on previous marker values or on both previous and current marker values. Under MACR all six methods yielded unbiased estimates of both group mean rates and between-group difference. However, the cross-sectional view of the data in the GEE method resulted in seriously biased estimates under MAR and NIM drop-out process. The bias in the estimates ranged from 30 per cent to 50 per cent. The degree of bias in the GEE estimates increases with the severity of non-randomness and with the proportion of MAR data. Under MCAR and MAR all the other five methods performed relatively well. RE and JMRE estimates were more efficient(that is, had smaller variance) than UWLS, WLS and CL estimates. Under NIM, WLS and particularly RE estimates tended to underestimate the average rate of marker change (bias approximately 10 per cent). Under NIM, UWLS, CL and JMRE performed better in terms of bias (3-5 per cent) with the JMRE giving the most efficient estimates. Given that markers are key variables related to disease progression, missing marker data are likely to be at least MAR. Thus, the GEE method may not be appropriate for analysing such longitudinal marker data. The potential biases due to incomplete data require greater recognition in reports of longitudinal studies. Sensitivity analyses to assess the effect of drop-outs on inferences about the target parameters are important. 相似文献

2.

Doubly robust generalized estimating equations for longitudinal data

Shaun Seaman Andrew Copas 《Statistics in medicine》2009,28(6):937-955

A popular method for analysing repeated‐measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse‐probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of being observed, according to some assumed missingness model. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model. WGEE are consistent when the data are MAR and the dropout model is correctly specified. Imputation methods are consistent when the data are MAR and the imputation model is correctly specified. Recently, doubly robust (DR) methods have been developed. These involve both a model for probability of missingness and an imputation model for the expectation of each missing observation, and are consistent when either is correct. We describe DR GEE, and illustrate their use on simulated data. We also analyse the INITIO randomized clinical trial of HIV therapy allowing for MAR dropout. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

3.

An appraisal of methods for the analysis of longitudinal categorical data with MAR drop-outs

O'Hara Hines RJ Hines WG 《Statistics in medicine》2005,24(23):3549-3563

A number of methods for analysing longitudinal ordinal categorical data with missing-at-random drop-outs are considered. Two are maximum-likelihood methods (MAXLIK) which employ marginal global odds ratios to model associations. The remainder use weighted or unweighted generalized estimating equations (GEE). Two of the GEE use Cholesky-decomposed standardized residuals to model the association structure, while another three extend methods developed for longitudinal binary data in which the association structures are modelled using either Gaussian estimation, multivariate normal estimating equations or conditional residuals. Simulated data sets were used to discover differences among the methods in terms of biases, variances and convergence rates when the association structure is misspecified. The methods were also applied to a real medical data set. Two of the GEE methods, referred to as Cond and ML-norm in this paper and by their originators, were found to have relatively good convergence rates and mean squared errors for all sample sizes (80, 120, 300) considered, and one more, referred to as MGEE in this paper and by its originators, worked fairly well for all but the smallest sample size, 80. 相似文献

4.

Marginalized transition models for longitudinal binary data with ignorable and non-ignorable drop-out

Kurland BF Heagerty PJ 《Statistics in medicine》2004,23(17):2673-2695

We extend the marginalized transition model of Heagerty to accommodate non-ignorable monotone drop-out. Using a selection model, weakly identified drop-out parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40 per cent compared to a likelihood-based marginalized transition model (MTM) with comparable modelling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and non-ignorable missing data, and both reduce bias noticeably by improving model fit. 相似文献

5.

Comparison of adaptive treatment strategies based on longitudinal outcomes in sequential multiple assignment randomized trials

下载免费PDF全文

Zhiguo Li 《Statistics in medicine》2017,36(3):403-415

In sequential multiple assignment randomized trials, longitudinal outcomes may be the most important outcomes of interest because this type of trials is usually conducted in areas of chronic diseases or conditions. We propose to use a weighted generalized estimating equation (GEE) approach to analyzing data from such type of trials for comparing two adaptive treatment strategies based on generalized linear models. Although the randomization probabilities are known, we consider estimated weights in which the randomization probabilities are replaced by their empirical estimates and prove that the resulting weighted GEE estimator is more efficient than the estimators with true weights. The variance of the weighted GEE estimator is estimated by an empirical sandwich estimator. The time variable in the model can be linear, piecewise linear, or more complicated forms. This provides more flexibility that is important because, in the adaptive treatment setting, the treatment changes over time and, hence, a single linear trend over the whole period of study may not be practical. Simulation results show that the weighted GEE estimators of regression coefficients are consistent regardless of the specification of the correlation structure of the longitudinal outcomes. The weighted GEE method is then applied in analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

6.

Analysis of pregnancy and other factors on detection of human papilloma virus (HPV) infection using weighted estimating equations for follow-up data 总被引：4，自引：0，他引：4

Ziegler A Kastner C Chang-Claude J 《Statistics in medicine》2003,22(13):2217-2233

Generalized estimating equations have been well established to draw inference for the marginal mean from follow-up data. Many studies suffer from missing data that may result in biased parameter estimates if the data are not missing completely at random. Robins and co-workers proposed using weighted estimating equations (WEE) in estimating the mean structure if drop-out occurs missing at random. We illustrate the differences between the WEE and the commonly applied available case analysis in a simulation study. We apply the WEE and reanalyse data of a longitudinal study of pregnancy and human papilloma virus (HPV) infection. We estimate the response probabilities and demonstrate that the data are not missing completely at random. Upon use of the WEE, we are able to show that pregnant women have an increased odds for an HPV infection compared with non-pregnant women after delivery (p=0.027). We conclude that the WEE are useful for dealing with monotone missing data due to drop-outs in follow-up data. 相似文献

7.

Analysis of incomplete longitudinal binary data using multiple imputation

Li X Mehrotra DV Barnard J 《Statistics in medicine》2006,25(12):2107-2124

We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration. 相似文献

8.

Application of robust estimating equations to the analysis of quantitative longitudinal data.

M Hu J M Lachin 《Statistics in medicine》2001,20(22):3411-3428

A model fit by general estimating equations (GEE) has been used extensively for the analysis of longitudinal data in medical studies. To some extent, GEE tries to minimize a quadratic form of the residuals, and therefore is not robust in the sense that it, like least squares estimates, is sensitive to heavy-tailed distributions, contaminated distributions and extreme values. This paper describes the family of truncated robust estimating equations and its properties for the analysis of quantitative longitudinal data. Like GEE, the robust estimating equations aim to assess the covariate effects in the generalized linear model in the complete population of observations, but in a manner that is more robust to the influence of aberrant observations. A simulation study has been conducted to compare the finite-sample performance of GEE and the robust estimating equations under a variety of error distributions and data structures. It shows that the parameter estimates based on GEE and the robust estimating equations are approximately unbiased and the type I errors of Wald tests do not tend to be inflated. GEE is slightly more efficient with pure normal data, but the efficiency of GEE declines much more quickly than the robust estimating equations when the data become contaminated or have heavy tails, which makes the robust estimating equations advantageous with non-normal data. Both GEE and the robust estimating equations are applied to a longitudinal analysis of renal function in the Diabetes Control and Complications Trial (DCCT). For this application, GEE seems to be sensitive to the working correlation specification in that different working correlation structures may lead to different conclusions about the effect of intensive diabetes treatment. On the other hand, the robust estimating equations consistently conclude that the treatment effect is highly significant no matter which working correlation structure is used. The DCCT Research Group also demonstrated a significant effect using a mixed-effects longitudinal model. 相似文献

9.

Misspecifying the covariance structure in a linear mixed model under MAR drop-out

Christos Thomadakis Loukia Meligkotsidou Nikos Pantazis Giota Touloumi 《Statistics in medicine》2020,39(23):3027-3041

Misspecification of the covariance structure in a linear mixed model (LMM) can lead to biased population parameters' estimates under MAR drop-out. In our motivating example of modeling CD4 cell counts during untreated HIV infection, random intercept and slope LMMs are frequently used. In this article, we evaluate the performance of LMMs with specific covariance structures, in terms of bias in the fixed effects estimates, under specific MAR drop-out mechanisms, and adopt a Bayesian model comparison criterion to discriminate between the examined approaches in real-data applications. We analytically show that using a random intercept and slope structure when the true one is more complex can lead to seriously biased estimates, with the degree of bias depending on the magnitude of the MAR drop-out. Under misspecified covariance structure, we compare in terms of induced bias the approach of adding a fractional Brownian motion (BM) process on top of random intercepts and slopes with the approach of using splines for the random effects. In general, the performance of both approaches was satisfactory, with the BM model leading to smaller bias in most cases. A simulation study is carried out to evaluate the performance of the proposed Bayesian criterion in identifying the model with the correct covariance structure. Overall, the proposed method performs better than the AIC and BIC criteria under our specific simulation setting. The models under consideration are applied to real data from the CASCADE study; the most plausible model is identified by all examined criteria. 相似文献

10.

Correlation structure and variable selection in generalized estimating equations via composite likelihood information criteria

下载免费PDF全文

Aristidis K. Nikoloulopoulos 《Statistics in medicine》2016,35(14):2377-2390

The method of generalized estimating equations (GEE) is popular in the biostatistics literature for analyzing longitudinal binary and count data. It assumes a generalized linear model for the outcome variable, and a working correlation among repeated measurements. In this paper, we introduce a viable competitor: the weighted scores method for generalized linear model margins. We weight the univariate score equations using a working discretized multivariate normal model that is a proper multivariate model. Because the weighted scores method is a parametric method based on likelihood, we propose composite likelihood information criteria as an intermediate step for model selection. The same criteria can be used for both correlation structure and variable selection. Simulations studies and the application example show that our method outperforms other existing model selection methods in GEE. From the example, it can be seen that our methods not only improve on GEE in terms of interpretability and efficiency but also can change the inferential conclusions with respect to GEE. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

11.

An improved quadratic inference function for parameter estimation in the analysis of correlated data

Philip M. Westgate Thomas M. Braun 《Statistics in medicine》2013,32(19):3260-3273

Generalized estimating equations (GEE) are commonly employed for the analysis of correlated data. However, the quadratic inference function (QIF) method is increasing in popularity because of its multiple theoretical advantages over GEE. We base our focus on the fact that the QIF method is more efficient than GEE when the working covariance structure for the data is misspecified. It has been shown that because of the use of an empirical weighting covariance matrix inside its estimating equations, the QIF method's realized estimation performance can potentially be inferior to GEE's when the number of independent clusters is not large. We therefore propose an alternative weighting matrix for the QIF, which asymptotically is an optimally weighted combination of the empirical covariance matrix and its model‐based version, which is derived by minimizing its expected quadratic loss. Use of the proposed weighting matrix maintains the large‐sample advantages the QIF approach has over GEE and, as shown via simulation, improves small‐sample parameter estimation. We also illustrated the proposed method in the analysis of a longitudinal study. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

12.

Sensitivity analysis for the estimation of rates of change with non-ignorable drop-out: an application to a randomized clinical trial of the vitamin D3

Matsuyama Y 《Statistics in medicine》2003,22(5):811-827

The vitamin D(3) trial was a repeated measures randomized clinical trial for secondary hyperparathyroidism in haemodialysis patients where the efficacy of the vitamin D(3) infusions for suppressing the secretion of parathyroid hormone (PTH) was compared among four dose groups over 12 weeks. In this trial, patients terminated the study before the scheduled end of the study due to their elevated serum calcium (Ca) level, that is, the administration of the vitamin D(3) was expected to cause hypercalcaemia as an adverse event. In this setting of monotone missingness, there is a potential for bias in estimation of mean rates of decline in PTH for each treatment group using the standard methods such as the generalized estimating equations (GEE) which ignore the observed past Ca histories. We estimated the treatment-group-specific mean rates of decline in PTH by the inverse probability of censoring weighted (IPCW) methods which account for the observed past histories of time-dependent factors that are both a predictor of drop-out and are correlated with the outcomes. The IPCW estimator can be viewed as an extension of the GEE estimator that allows for the data to be MAR but not MCAR. With missing data, it is rarely appropriate to analyse the data solely under the assumption that the missing data process is ignorable, because the assumption of ignorable missingness cannot be guaranteed to hold and is untestable from the observed data. We proposed a sensitivity analysis that examines how inference about the IPCW estimates of the treatment-group-specific mean rates of decline in PTH changes as we vary the non-ignorable selection bias parameter over a range of plausible values. 相似文献

13.

Statistical analysis of correlated data using generalized estimating equations: an orientation 总被引：16，自引：0，他引：16

Hanley JA Negassa A Edwardes MD Forrester JE 《American journal of epidemiology》2003,157(4):364-375

相似文献

14.

A weighted estimating equation for linear regression with missing covariate data

Parzen M Lipsitz SR Ibrahim JG Lipshultz S 《Statistics in medicine》2002,21(16):2421-2436

Linear regression is one of the most popular statistical techniques. In linear regression analysis, missing covariate data occur often. A recent approach to analyse such data is a weighted estimating equation. With weighted estimating equations, the contribution to the estimating equation from a complete observation is weighted by the inverse 'probability of being observed'. In this paper, we propose a weighted estimating equation in which we wrongly assume that the missing covariates are multivariate normal, but still produces consistent estimates as long as the probability of being observed is correctly modelled. In simulations, these weighted estimating equations appear to be highly efficient when compared to the most efficient weighted estimating equation as proposed by Robins et al. and Lipsitz et al. However, these weighted estimating equations, in which we wrongly assume that the missing covariates are multivariate normal, are much less computationally intensive than the weighted estimating equations given by Lipsitz et al. We compare the weighted estimating equations proposed in this paper to the efficient weighted estimating equations via an example and a simulation study. We only consider missing data which are missing at random; non-ignorably missing data are not addressed in this paper. 相似文献

15.

Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations

Whitney P. Ford Philip M. Westgate 《Statistics in medicine》2020,39(21):2779-2792

Stepped wedge cluster trials are an increasingly popular alternative to traditional parallel cluster randomized trials. Such trials often utilize a small number of clusters and numerous time intervals, and these components must be considered when choosing an analysis method. A generalized linear mixed model containing a random intercept and fixed time and intervention covariates is the most common analysis approach. However, the sole use of a random intercept applies a constant intraclass correlation coefficient structure, which is an assumption that is likely to be violated given stepped wedge trials (SWTs) have multiple time intervals. Alternatively, generalized estimating equations (GEE) are robust to the misspecification of the working correlation structure, although it has been shown that small-sample adjustments to standard error estimates and the use of appropriate degrees of freedom are required to maintain the validity of inference when the number of clusters is small. In this article, we show, using an extensive simulation study based on a motivating example and a more general design, the use of GEE can maintain the validity of inference in small-sample SWTs with binary outcomes. Furthermore, we show which combinations of bias corrections to standard error estimates and degrees of freedom work best in terms of attaining nominal type I error rates. 相似文献

16.

The effect of cluster size imbalance and covariates on the estimation performance of quadratic inference functions

Westgate PM Braun TM 《Statistics in medicine》2012,31(20):2209-2222

Generalized estimating equations (GEE) are commonly used for the analysis of correlated data. However, use of quadratic inference functions (QIFs) is becoming popular because it increases efficiency relative to GEE when the working covariance structure is misspecified. Although shown to be advantageous in the literature, the impacts of covariates and imbalanced cluster sizes on the estimation performance of the QIF method in finite samples have not been studied. This cluster size variation causes QIF's estimating equations and GEE to be in separate classes when an exchangeable correlation structure is implemented, causing QIF and GEE to be incomparable in terms of efficiency. When utilizing this structure and the number of clusters is not large, we discuss how covariates and cluster size imbalance can cause QIF, rather than GEE, to produce estimates with the larger variability. This occurrence is mainly due to the empirical nature of weighting QIF employs, rather than differences in estimating equations classes. We demonstrate QIF's lost estimation precision through simulation studies covering a variety of general cluster randomized trial scenarios and compare QIF and GEE in the analysis of data from a cluster randomized trial. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

17.

A comparison of several approaches for choosing between working correlation structures in generalized estimating equation analysis of longitudinal binary data

Justine Shults Wenguang Sun Xin Tu Hanjoo Kim Jay Amsterdam Joseph M. Hilbe Thomas Ten‐Have 《Statistics in medicine》2009,28(18):2338-2355

The method of generalized estimating equations (GEE) models the association between the repeated observations on a subject with a patterned correlation matrix. Correct specification of the underlying structure is a potentially beneficial goal, in terms of improving efficiency and enhancing scientific understanding. We consider two sets of criteria that have previously been suggested, respectively, for selecting an appropriate working correlation structure, and for ruling out a particular structure(s), in the GEE analysis of longitudinal studies with binary outcomes. The first selection criterion chooses the structure for which the model‐based and the sandwich‐based estimator of the covariance matrix of the regression parameter estimator are closest, while the second selection criterion chooses the structure that minimizes the weighted error sum of squares. The rule out criterion deselects structures for which the estimated correlation parameter violates standard constraints for binary data that depend on the marginal means. In addition, we remove structures from consideration if their estimated parameter values yield an estimated correlation structure that is not positive definite. We investigate the performance of the two sets of criteria using both simulated and real data, in the context of a longitudinal trial that compares two treatments for major depressive episode. Practical recommendations are also given on using these criteria to aid in the efficient selection of a working correlation structure in GEE analysis of longitudinal binary data. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

18.

A robust and unified framework for estimating heritability in twin studies using generalized estimating equations

Jaron Arbet Matt McGue Saonli Basu 《Statistics in medicine》2020,39(27):3897-3913

The ‘heritability’ of a phenotype measures the proportion of trait variance due to genetic factors in a population. In the past 50 years, studies with monozygotic and dizygotic twins have estimated heritability for 17,804 traits;¹ thus twin studies are popular for estimating heritability. Researchers are often interested in estimating heritability for non-normally distributed outcomes such as binary, counts, skewed or heavy-tailed continuous traits. In these settings, the traditional normal ACE model (NACE) and Falconer's method can produce poor coverage of the true heritability. Therefore, we propose a robust generalized estimating equations (GEE2) framework for estimating the heritability of non-normally distributed outcomes. The traditional NACE and Falconer's method are derived within this unified GEE2 framework, which additionally provides robust standard errors. Although the traditional Falconer's method cannot adjust for covariates, the corresponding ‘GEE2-Falconer’ can incorporate mean and variance-level covariate effects (e.g. let heritability vary by sex or age). Given a non-normally distributed outcome, the GEE2 models are shown to attain better coverage of the true heritability compared to traditional methods. Finally, a scenario is demonstrated where NACE produces biased estimates of heritability while Falconer remains unbiased. Therefore, we recommend GEE2-Falconer for estimating the heritability of non-normally distributed outcomes in twin studies. 相似文献

19.

Comparison of the risk difference, risk ratio and odds ratio scales for quantifying the unadjusted intervention effect in cluster randomized trials

Ukoumunne OC Forbes AB Carlin JB Gulliford MC 《Statistics in medicine》2008,27(25):5143-5155

This paper evaluates methods for unadjusted analyses of binary outcomes in cluster randomized trials (CRTs). Under the generalized estimating equations (GEE) method the identity, log and logit link functions may be specified to make inferences on the risk difference, risk ratio and odds ratio scales, respectively. An alternative, 'cluster-level', method applies the t-test to summary statistics calculated for each cluster, using proportions, log proportions and log odds, to make inferences on the respective scales. Simulation was used to estimate the bias of the unadjusted intervention effect estimates and confidence interval coverage, generating data sets with different combinations of number of clusters, number of participants per cluster, intra-cluster correlation coefficient rho and intervention effect. When the identity link was specified, GEE had little bias and good coverage, performing slightly better than the log and logit link functions. The cluster-level method provided unbiased point estimates when proportions were used to summarize the clusters. When the log proportion and log odds were used, however, the method often had markedly large bias for two reasons: (i) bias in the modified summary statistic used for cluster-level estimation when a cluster has zero cases with the outcome of interest (arising when the number of participants sampled per cluster is small and the outcome prevalence is low) and (ii) asymptotically, the method estimates the ratio of geometric means of the cluster proportions or odds, respectively, between the trial arms rather than the ratio of arithmetic means. 相似文献

20.

Weighting condom use data to account for nonignorable cluster size

Williamson JM Kim HY Warner L 《Annals of epidemiology》2007,17(8):603-607

PURPOSE: We examined the impact of weighting the generalized estimating equation (GEE) by the inverse of the number of sex acts on the magnitude of association for factors predictive of recent condom use. METHODS: Data were analyzed from a cross-sectional survey on condom use reported during vaginal intercourse during the past year among male students attending two Georgia universities. The usual GEE model was fit to the data predicting the binary act-specific response indicating whether a condom was used. A second cluster-weighted GEE model (i.e., weighting the GEE score equation by the inverse of the number of sex acts) was also fit to predict condom use. RESULTS: Study participants who engaged in a greater frequency of sex acts were less likely to report condom use, resulting in nonignorable cluster-size data. The GEE analysis weighted by sex act (usual GEE) and the GEE analysis weighted by study subject (cluster-weighted GEE) produced different estimates of the association between the covariates and condom use in last year. For example, the cluster-weighted GEE analysis resulted in a marginally significant relationship between age and condom use (odds ratio of 0.49 with 95% confidence interval (0.23-1.03) for older versus younger participants) versus a nonsignificant relationship with the usual GEE model (odds ratio of 0.67 with a 95% confidence interval of 0.28-1.60). CONCLUSIONS: The two ways of weighting the GEE score equation, by the sex act or by the respondent, may produce different results and a different interpretation of the parameters in the presence of nonignorable cluster size. 相似文献