首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Generalized estimating equations (GEE) is a general statistical method to fit marginal models for longitudinal data in biomedical studies. The variance–covariance matrix of the regression parameter coefficients is usually estimated by a robust “sandwich” variance estimator, which does not perform satisfactorily when the sample size is small. To reduce the downward bias and improve the efficiency, several modified variance estimators have been proposed for bias‐correction or efficiency improvement. In this paper, we provide a comprehensive review on recent developments of modified variance estimators and compare their small‐sample performance theoretically and numerically through simulation and real data examples. In particular, Wald tests and t‐tests based on different variance estimators are used for hypothesis testing, and the guideline on appropriate sample sizes for each estimator is provided for preserving type I error in general cases based on numerical results. Moreover, we develop a user‐friendly R package “geesmv” incorporating all of these variance estimators for public usage in practice. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster‐randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias‐corrected sandwich estimators. Our results suggest that the GEE Wald z‐test should be avoided in the analyses of CRTs with few clusters even when bias‐corrected sandwich estimators are used. With t‐distribution approximation, the Kauermann and Carroll (KC)‐correction can keep the test size to nominal levels even when the number of clusters is as low as 10 and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)‐correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t‐test and KC‐correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes because of fewer assumptions and robustness to the misspecification of the covariance structure. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
Three-level cluster randomized trials (CRTs) are increasingly used in implementation science, where 2fold-nested-correlated data arise. For example, interventions are randomly assigned to practices, and providers within the same practice who provide care to participants are trained with the assigned intervention. Teerenstra et al proposed a nested exchangeable correlation structure that accounts for two levels of clustering within the generalized estimating equations (GEE) approach. In this article, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in three-level CRTs. Given the nested exchangeable correlation structure, we derive the asymptotic variances of the estimator of the treatment effect for different types of outcomes. When the number of clusters is small, researchers have proposed bias-corrected sandwich estimators to improve performance in two-level CRTs. We extend the variances of two bias-corrected sandwich estimators to three-level CRTs. The equal provider and practice sizes were assumed to calculate number of practices for simplicity. However, they are not guaranteed in practice. Relative efficiency (RE) is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal provider and practice sizes. The expressions of REs are obtained from both asymptotic variance estimation and bias-corrected sandwich estimators. Their performances are evaluated for different scenarios of provider and practice size distributions through simulation studies. Finally, a percentage increase in the number of practices is proposed due to efficiency loss from unequal provider and/or practice sizes.  相似文献   

4.
In generalized estimating equations (GEE), the correlation between the repeated observations on a subject is specified with a working correlation matrix. Correct specification of the working correlation structure ensures efficient estimators of the regression coefficients. Among the criteria used, in practice, for selecting working correlation structure, Rotnitzky‐Jewell, Quasi Information Criterion (QIC) and Correlation Information Criterion (CIC) are based on the fact that if the assumed working correlation structure is correct then the model‐based (naive) and the sandwich (robust) covariance estimators of the regression coefficient estimators should be close to each other. The sandwich covariance estimator, used in defining the Rotnitzky‐Jewell, QIC and CIC criteria, is biased downward and has a larger variability than the corresponding model‐based covariance estimator. Motivated by this fact, a new criterion is proposed in this paper based on the bias‐corrected sandwich covariance estimator for selecting an appropriate working correlation structure in GEE. A comparison of the proposed and the competing criteria is shown using simulation studies with correlated binary responses. The results revealed that the proposed criterion generally performs better than the competing criteria. An example of selecting the appropriate working correlation structure has also been shown using the data from Madras Schizophrenia Study. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
Cluster randomized designs are frequently employed in pragmatic clinical trials which test interventions in the full spectrum of everyday clinical settings in order to maximize applicability and generalizability. In this study, we propose to directly incorporate pragmatic features into power analysis for cluster randomized trials with count outcomes. The pragmatic features considered include arbitrary randomization ratio, overdispersion, random variability in cluster size, and unequal lengths of follow-up over which the count outcome is measured. The proposed method is developed based on generalized estimating equation (GEE) and it is advantageous in that the sample size formula retains a closed form, facilitating its implementation in pragmatic trials. We theoretically explore the impact of various pragmatic features on sample size requirements. An efficient Jackknife algorithm is presented to address the problem of underestimated variance by the GEE sandwich estimator when the number of clusters is small. We assess the performance of the proposed sample size method through extensive simulation and an application example to a real clinical trial is presented.  相似文献   

6.
In this paper we propose a sample size calculation method for testing on a binomial proportion when binary observations are dependent within clusters. In estimating the binomial proportion in clustered binary data, two weighting systems have been popular: equal weights to clusters and equal weights to units within clusters. When the number of units varies cluster by cluster, performance of these two weighting systems depends on the extent of correlation among units within each cluster. In addition to them, we will also use an optimal weighting method that minimizes the variance of the estimator. A sample size formula is derived for each of the estimators with different weighting schemes. We apply these methods to the sample size calculation for the sensitivity of a periodontal diagnostic test. Simulation studies are conducted to evaluate a finite sample performance of the three estimators. We also assess the influence of misspecified input parameter values on the calculated sample size. The optimal estimator requires equal or smaller sample sizes and is more robust to the misspecification of an input parameter than those assigning equal weights to units or clusters.  相似文献   

7.
Generalized estimating equation (GEE) is a popular approach for analyzing correlated binary data. However, the problems of separation in GEE are still unknown. The separation created by a covariate often occurs in small correlated binary data and even in large data with rare outcome and/or high intra-cluster correlation and a number of influential covariates. This paper investigated the consequences of separation in GEE and addressed them by introducing a penalized GEE, termed as PGEE. The PGEE is obtained by adding Firth-type penalty term, which was originally proposed for generalized linear model score equation, to standard GEE and shown to achieve convergence and provide finite estimate of the regression coefficient in the presence of separation, which are not often possible in GEE. Further, a small-sample bias correction to the sandwich covariance estimator of the PGEE estimator is suggested. Simulations also showed that the GEE failed to achieve convergence and/or provided infinitely large estimate of the regression coefficient in the presence of complete or quasi-complete separation, whereas the PGEE showed significant improvement by achieving convergence and providing finite estimate. Even in the presence of near-to-separation, the PGEE also showed superior properties over the GEE. Furthermore, the bias-corrected sandwich estimator for the PGEE estimator showed substantial improvement over the standard sandwich estimator by reducing bias in estimating type I error rate. An illustration using real data also supported the findings of simulation. The PGEE with bias-corrected sandwich covariance estimator is recommended to use for small-to-moderate size sample (N ≤ 50) and even can be used for large sample if there is any evidence of separation or near-to-separation.  相似文献   

8.
In sequential multiple assignment randomized trials, longitudinal outcomes may be the most important outcomes of interest because this type of trials is usually conducted in areas of chronic diseases or conditions. We propose to use a weighted generalized estimating equation (GEE) approach to analyzing data from such type of trials for comparing two adaptive treatment strategies based on generalized linear models. Although the randomization probabilities are known, we consider estimated weights in which the randomization probabilities are replaced by their empirical estimates and prove that the resulting weighted GEE estimator is more efficient than the estimators with true weights. The variance of the weighted GEE estimator is estimated by an empirical sandwich estimator. The time variable in the model can be linear, piecewise linear, or more complicated forms. This provides more flexibility that is important because, in the adaptive treatment setting, the treatment changes over time and, hence, a single linear trend over the whole period of study may not be practical. Simulation results show that the weighted GEE estimators of regression coefficients are consistent regardless of the specification of the correlation structure of the longitudinal outcomes. The weighted GEE method is then applied in analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
The sandwich variance estimator of generalized estimating equations (GEE) may not perform well when the number of independent clusters is small. This could jeopardize the validity of the robust Wald test by causing inflated type I error and lower coverage probability of the corresponding confidence interval than the nominal level. Here, we investigate the small-sample performance of the robust score test for correlated data and propose several modifications to improve the performance. In a simulation study, we compare the robust score test to the robust Wald test for correlated Bernoulli and Poisson data, respectively. It is confirmed that the robust Wald test is too liberal whereas the robust score test is too conservative for small samples. To explain this puzzling operating difference between the two tests, we consider their applications to two special cases, one-sample and two-sample comparisons, thus motivating some modifications to the robust score test. A modification based on a simple adjustment to the usual robust score statistic by a factor of J/(J - 1) (where J is the number of clusters) reduces the conservativeness of the generalized score test. Simulation studies mimicking group-randomized clinical trials with binary and count responses indicated that it may improve the small-sample performance over that of the generalized score and Wald tests with test size closer to the nominal level. Finally, we demonstrate the utility of our proposal by applying it to a group-randomized clinical trial, trying alternative cafeteria options in schools (TACOS).  相似文献   

10.
The sandwich standard error estimator is commonly used for making inferences about parameter estimates found as solutions to generalized estimating equations (GEE) for clustered data. The sandwich tends to underestimate the variability in the parameter estimates when the number of clusters is small, and reference distributions commonly used for hypothesis testing poorly approximate the distribution of Wald test statistics. Consequently, tests have greater than nominal type I error rates. We propose tests that use bias-reduced linearization, BRL, to adjust the sandwich estimator and Satterthwaite or saddlepoint approximations for the reference distribution of resulting Wald t-tests. We conducted a large simulation study of tests using a variety of estimators (traditional sandwich, BRL, Mancl and DeRouen's BC estimator, and a modification of an estimator proposed by Kott) and approximations to reference distributions under diverse settings that varied the distribution of the explanatory variables, the values of coefficients, and the degree of intra-cluster correlation (ICC). Our new method generally worked well, providing accurate estimates of the variability of fitted coefficients and tests with near-nominal type I error rates when the ICC is small. Our method works less well when the ICC is large, but it continues to out-perform the traditional sandwich and other alternatives.  相似文献   

11.
In observational studies, estimation of average causal treatment effect on a patient's response should adjust for confounders that are associated with both treatment exposure and response. In addition, the response, such as medical cost, may have incomplete follow‐up. In this article, a double robust estimator is proposed for average causal treatment effect for right censored medical cost data. The estimator is double robust in the sense that it remains consistent when either the model for the treatment assignment or the regression model for the response is correctly specified. Double robust estimators increase the likelihood the results will represent a valid inference. Asymptotic normality is obtained for the proposed estimator, and an estimator for the asymptotic variance is also derived. Simulation studies show good finite sample performance of the proposed estimator and a real data analysis using the proposed method is provided as illustration. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
Many different methods have been proposed for the analysis of cluster randomized trials (CRTs) over the last 30 years. However, the evaluation of methods on overdispersed count data has been based mostly on the comparison of results using empiric data; i.e. when the true model parameters are not known. In this study, we assess via simulation the performance of five methods for the analysis of counts in situations similar to real community‐intervention trials. We used the negative binomial distribution to simulate overdispersed counts of CRTs with two study arms, allowing the period of time under observation to vary among individuals. We assessed different sample sizes, degrees of clustering and degrees of cluster‐size imbalance. The compared methods are: (i) the two‐sample t‐test of cluster‐level rates, (ii) generalized estimating equations (GEE) with empirical covariance estimators, (iii) GEE with model‐based covariance estimators, (iv) generalized linear mixed models (GLMM) and (v) Bayesian hierarchical models (Bayes‐HM). Variation in sample size and clustering led to differences between the methods in terms of coverage, significance, power and random‐effects estimation. GLMM and Bayes‐HM performed better in general with Bayes‐HM producing less dispersed results for random‐effects estimates although upward biased when clustering was low. GEE showed higher power but anticonservative coverage and elevated type I error rates. Imbalance affected the overall performance of the cluster‐level t‐test and the GEE's coverage in small samples. Important effects arising from accounting for overdispersion are illustrated through the analysis of a community‐intervention trial on Solar Water Disinfection in rural Bolivia. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

13.
Generalized estimating equations (GEEs) are commonly used to estimate transition models. When the Markov assumption does not hold but first-order transition probabilities are still of interest, the transition inference is sensitive to the choice of working correlation. In this paper, we consider a random process transition model as the true underlying data generating mechanism, which characterizes subject heterogeneity and complex dependence structure of the outcome process in a very flexible way. We formally define two types of transition probabilities at the population level: “naive transition probabilities” that average across all the transitions and “population-average transition probabilities” that average the subject-specific transition probabilities. Through asymptotic bias calculations and finite-sample simulations, we demonstrate that the unstructured working correlation provides unbiased estimators of the population-average transition probabilities while the independence working correlation provides unbiased estimators of the naive transition probabilities. For population-average transition estimation, we demonstrate that the sandwich estimator fails for unstructured GEE and recommend the use of either jackknife or bootstrap variance estimates. The proposed method is motivated by and applied to the NEXT Generation Health Study, where the interest is in estimating the population-average transition probabilities of alcohol use in adolescents.  相似文献   

14.
Often in biomedical studies, the event of interest is recurrent and within-subject events cannot usually be assumed independent. In semi-parametric estimation of the proportional rates model, a working independence assumption leads to an estimating equation for the regression parameter vector, with within-subject correlation accounted for through a robust (sandwich) variance estimator; these methods have been extended to the case of clustered subjects. We consider variance estimation in the setting where subjects are clustered and the study consists of a small number of moderate-to-large-sized clusters. We demonstrate through simulation that the robust estimator is quite inaccurate in this setting. We propose a corrected version of the robust variance estimator, as well as jackknife and bootstrap estimators. Simulation studies reveal that the corrected variance is considerably more accurate than the robust estimator, and slightly more accurate than the jackknife and bootstrap variance. The proposed methods are used to compare hospitalization rates between Canada and the U.S. in a multi-centre dialysis study. Copyright (c) 2005 John Wiley & Sons, Ltd.  相似文献   

15.
A case-cohort sample of adoptees was collected to investigate genetic and environmental influences on premature death, which motivated us to supplement existing simulation results to explore the performance of various estimators proposed for case-cohort samples of survival data. We studied six regression coefficients estimators, which differ with regard to the weighting scheme used in a pseudo-likelihood function, and two different estimators of their variances. Compared to earlier simulation studies, we changed the following conditions: type of explanatory variable, the distribution of lifetimes, and the percentage of deaths in the full cohort. The latter condition affected the performance of the estimated variances of the regression coefficients, where we found a systematic bias of the estimator, proposed by Self and Prentice, dependent on the percentages of deaths. This dependence of percentages of death was different for different sizes of case-cohort studies. A robust variance estimator showed a better overall performance. The estimators of regression coefficients compared did not differ much, the estimators proposed by Kalbfleisch and Lawless and by Prentice performing very well. Results of the case-cohort data of adoptees were not in conflict with earlier findings of a moderate genetic influence on premature death in adulthood.  相似文献   

16.
Genome-wide association studies (GWAS) have been frequently conducted on general or isolated populations with related individuals. However, there is a lack of consensus on which strategy is most appropriate for analyzing dichotomous phenotypes in general pedigrees. Using simulation studies, we compared several strategies including generalized estimating equations (GEE) strategies with various working correlation structures, generalized linear mixed model (GLMM), and a variance component strategy (denoted LMEBIN) that treats dichotomous outcomes as continuous with special attentions to their performance with rare variants, rare diseases, and small sample sizes. In our simulations, when the sample size is not small, for type I error, only GEE and LMEBIN maintain nominal type I error in most cases with exceptions for GEE with very rare disease and genetic variants. GEE and LMEBIN have similar statistical power and slightly outperform GLMM when the prevalence is low. In terms of computational efficiency, GEE with sandwich variance estimator outperforms GLMM and LMEBIN. We apply the strategies to GWAS of gout in the Framingham Heart Study. Based on our results, we would recommend using GEE ind-san in the GWAS for common variants and GEE ind-fij or LMEBIN for rare variants for GWAS of dichotomous outcomes with general pedigrees.  相似文献   

17.
Pan W  Wall MM 《Statistics in medicine》2002,21(10):1429-1441
The generalized estimating equation (GEE) approach is widely used in regression analyses with correlated response data. Under mild conditions, the resulting regression coefficient estimator is consistent and asymptotically normal with its variance being consistently estimated by the so-called sandwich estimator. Statistical inference is thus accomplished by using the asymptotic Wald chi-squared test. However, it has been noted in the literature that for small samples the sandwich estimator may not perform well and may lead to much inflated type I errors for the Wald chi-squared test. Here we propose using an approximate t- or F-test that takes account of the variability of the sandwich estimator. The level of type I error of the proposed t- or F-test is guaranteed to be no larger than that of the Wald chi-squared test. The satisfactory performance of the proposed new tests is confirmed in a simulation study. Our proposal also has some advantages when compared with other new approaches based on direct modifications of the sandwich estimator, including the one that corrects the downward bias of the sandwich estimator. In addition to hypothesis testing, our result has a clear implication on constructing Wald-type confidence intervals or regions.  相似文献   

18.
In this paper, we propose a hybrid variance estimator for the Kaplan-Meier survival function. This new estimator approximates the true variance by a Binomial variance formula, where the proportion parameter is a piecewise non-increasing function of the Kaplan-Meier survival function and its upper bound, as described below. Also, the effective sample size equals the number of subjects not censored prior to that time. In addition, we consider an adjusted hybrid variance estimator that modifies the regular estimator for small sample sizes. We present a simulation study to compare the performance of the regular and adjusted hybrid variance estimators to the Greenwood and Peto variance estimators for small sample sizes. We show that on average these hybrid variance estimators give closer variance estimates to the true values than the traditional variance estimators, and hence confidence intervals constructed with these hybrid variance estimators have more nominal coverage rates. Indeed, the Greenwood and Peto variance estimators can substantially underestimate the true variance in the left and right tails of the survival distribution, even with moderately censored data. Finally, we illustrate the use of these hybrid and traditional variance estimators on a data set from a leukaemia clinical trial.  相似文献   

19.
Populations of non-European ancestry are substantially underrepresented in genome-wide association studies (GWAS). As genetic effects can differ between ancestries due to possibly different causal variants or linkage disequilibrium patterns, a meta-analysis that includes GWAS of all populations yields biased estimation in each of the populations and the bias disproportionately impacts non-European ancestry populations. This is because meta-analysis combines study-specific estimates with inverse variance as the weights, which causes biases towards studies with the largest sample size, typical of the European ancestry population. In this paper, we propose two empirical Bayes (EB) estimators to borrow the strength of information across populations although accounting for between-population heterogeneity. Extensive simulation studies show that the proposed EB estimators are largely unbiased and improve efficiency compared to the population-specific estimator. In contrast, even though the meta-analysis estimator has a much smaller variance, it yields significant bias when the genetic effect is heterogeneous across populations. We apply the proposed EB estimators to a large-scale trans-ancestry GWAS of stroke and demonstrate that the EB estimators reduce the variance of the population-specific estimator substantially, with the effect estimates close to the population-specific estimates.  相似文献   

20.
We propose a simple correction factor for the variance of the logarithm of the common odds ratio estimated by the method of Mantel and Haenszel from a series of (2 x 2) tables when data are cluster correlated. The adjustment is applied to the variance estimators proposed by Hauck and by Robins, Breslow and Greenland for the log of the Mantel-Haenszel common odds ratio, and its performance is evaluated in a simulation study. The key features of the proposed adjustment are: (i) it has closed-form; (ii) it can accommodate covariates defined at the cluster-specific level, the site-specific level, or both; and (iii) it does not require the user to specify a particular correlation structure for the response data. The correction derives from Liang and Zeger's generalized estimating equations (GEE) technique for logistic regression modelling. Via simulation, we examine empirical versus nominal coverage probabilities for interval estimation of the common odds ratio using adjusted and unadjusted variance estimates, and we present ratios of observed to estimated variances. Results are compared to those obtained from the fully iterated GEE analysis. The characteristics of the simulation study mimic scenarios common in the periodontal research setting, with small numbers of subjects (N = 25, 50), moderate numbers of sites per cluster (m = 4, 16, 32), and modest intracluster correlation levels (rho = 0.0, 0.1, 0.2, 0.3). Results show that adjusted confidence intervals (applied to the Hauck or the Robins, Breslow, Greenland variance estimate) provide coverage probabilities close to the nominal level for the Mantel--Haenszel common odds ratio over a variety of cluster sizes and levels of correlation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号