Estimation in generalized linear mixed models (GLMMs) is often based on maximum likelihood theory, assuming that the underlying probability model is correctly specified. However, the validity of this assumption is sometimes difficult to verify. In this paper we study, through simulations, the impact of misspecifying the random-effects distribution on the estimation and hypothesis testing in GLMMs. It is shown that the maximum likelihood estimators are inconsistent in the presence of misspecification. The bias induced in the mean-structure parameters is generally small, as far as the variability of the underlying random-effects distribution is small as well. However, the estimates of this variability are always severely biased. Given that the variance components are the only tool to study the variability of the true distribution, it is difficult to assess whether problems in the estimation of the mean structure occur. The type I error rate and the power of the commonly used inferential procedures are also severely affected. The situation is aggravated if more than one random effect is included in the model. Further, we propose to deal with possible misspecification by way of sensitivity analysis, considering several random-effects distributions. All the results are illustrated using data from a clinical trial in schizophrenia.  相似文献   

The Generalised linear mixed model (GLMM) is widely used for modelling environmental data. However, such data are prone to influential observations, which can distort the estimated exposure–response curve particularly in regions of high exposure. Deletion diagnostics for iterative estimation schemes commonly derive the deleted estimates based on a single iteration of the full system holding certain pivotal quantities such as the information matrix to be constant. In this paper, we present an approximate formula for the deleted estimates and Cook's distance for the GLMM, which does not assume that the estimates of variance parameters are unaffected by deletion. The procedure allows the user to calculate standardised DFBETAs for mean as well as variance parameters. In certain cases such as when using the GLMM as a device for smoothing, such residuals for the variance parameters are interesting in their own right. In general, the procedure leads to deleted estimates of mean parameters, which are corrected for the effect of deletion on variance components as estimation of the two sets of parameters is interdependent. The probabilistic behaviour of these residuals is investigated and a simulation based procedure suggested for their standardisation. The method is used to identify influential individuals in an occupational cohort exposed to silica. The results show that failure to conduct post model fitting diagnostics for variance components can lead to erroneous conclusions about the fitted curve and unstable confidence intervals. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

Two main classes of methodology have been developed for addressing the analytical intractability of generalized linear mixed models: likelihood‐based methods and Bayesian methods. Likelihood‐based methods such as the penalized quasi‐likelihood approach have been shown to produce biased estimates especially for binary clustered data with small clusters sizes. More recent methods using adaptive Gaussian quadrature perform well but can be overwhelmed by problems with large numbers of random effects, and efficient algorithms to better handle these situations have not yet been integrated in standard statistical packages. Bayesian methods, although they have good frequentist properties when the model is correct, are known to be computationally intensive and also require specialized code, limiting their use in practice. In this article, we introduce a modification of the hybrid approach of Capanu and Begg, 2011, Biometrics 67 , 371–380, as a bridge between the likelihood‐based and Bayesian approaches by employing Bayesian estimation for the variance components followed by Laplacian estimation for the regression coefficients. We investigate its performance as well as that of several likelihood‐based methods in the setting of generalized linear mixed models with binary outcomes. We apply the methods to three datasets and conduct simulations to illustrate their properties. Simulation results indicate that for moderate to large numbers of observations per random effect, adaptive Gaussian quadrature and the Laplacian approximation are very accurate, with adaptive Gaussian quadrature preferable as the number of observations per random effect increases. The hybrid approach is overall similar to the Laplace method, and it can be superior for data with very sparse random effects. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

When investigating health disparities, it can be of interest to explore whether adjustment for socioeconomic factors at the neighborhood level can account for, or even reverse, an unadjusted difference. Recently, we proposed new methods to adjust the effect of an individual‐level covariate for confounding by unmeasured neighborhood‐level covariates using complex survey data and a generalization of conditional likelihood methods. Generalized linear mixed models (GLMMs) are a popular alternative to conditional likelihood methods in many circumstances. Therefore, in the present article, we propose and investigate a new adaptation of GLMMs for complex survey data that achieves the same goal of adjusting for confounding by unmeasured neighborhood‐level covariates. With the new GLMM approach, one must correctly model the expectation of the unmeasured neighborhood‐level effect as a function of the individual‐level covariates. We demonstrate using simulations that even if that model is correct, census data on the individual‐level covariates are sometimes required for consistent estimation of the effect of the individual‐level covariate. We apply the new methods to investigate disparities in recency of dental cleaning, treated as an ordinal outcome, using data from the 2008 Florida Behavioral Risk Factor Surveillance System (BRFSS) survey. We operationalize neighborhood as zip code and merge the BRFSS data with census data on ZIP Code Tabulated Areas to incorporate census data on the individual‐level covariates. We compare the new results to our previous analysis, which used conditional likelihood methods. We find that the results are qualitatively similar. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

We propose statistical definitions of the individual benefit of a medical or behavioral treatment and of the severity of a chronic illness. These definitions are used to develop a graphical method that can be used by statisticians and clinicians in the data analysis of clinical trials from the perspective of personalized medicine. The method focuses on assessing and comparing individual effects of treatments rather than average effects and can be used with continuous and discrete responses, including dichotomous and count responses. The method is based on new developments in generalized linear mixed‐effects models, which are introduced in this article. To illustrate, analyses of data from the Sequenced Treatment Alternatives to Relieve Depression clinical trial of sequences of treatments for depression and data from a clinical trial of respiratory treatments are presented. The estimation of individual benefits is also explained. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a vector of repeated measurements taken on the same subject, there is often inherent dependence between observations. However, a common covariance assumption is conditional independence—that is, given the mixture component label, the outcomes for subjects are independent. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little asymptotic bias when mixture components are well separated or if the assumed correlation is close to the truth even when the covariance is misspecified. We also present a robust standard error estimator and show that it outperforms conventional estimators in simulations and can indicate that the model is misspecified. Body mass index data from a national longitudinal study are used to demonstrate the effects of misspecification on potential inferences made in practice. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

Generalized linear mixed models have played an important role in the analysis of longitudinal data; however, traditional approaches have limited flexibility in accommodating skewness and complex correlation structures. In addition, the existing estimation approaches generally rely heavily on the specifications of random effects distributions; therefore, the corresponding inferences are sometimes sensitive to the choice of random effect distributions under certain circumstance. In this paper, we incorporate serially dependent distribution‐free random effects into Tweedie generalized linear models to accommodate a wide range of skewness and covariance structures for discrete and continuous longitudinal data. An optimal estimation of our model has been developed using the orthodox best linear unbiased predictors of random effects. Our approach unifies population‐averaged and subject‐specific inferences. Our method is illustrated through the analyses of patient‐controlled analgesia data and Framingham cholesterol data.  相似文献   

Gibbs sampling-based generalized linear mixed models (GLMMs) provide a convenient and flexible way to extend variance components models for multivariate normally distributed continuous traits to other classes of phenotype. This includes binary traits and right-censored failure times such as age-at-onset data. The approach has applications in many areas of genetic epidemiology. However, the required GLMMs are sensitive to nonrandom ascertainment. In the absence of an appropriate correction for ascertainment, they can exhibit marked positive bias in the estimated grand mean and serious shrinkage in the estimated magnitude of variance components. To compound practical difficulties, it is currently difficult to implement a conventional adjustment for ascertainment because of the need to undertake repeated integration across the distribution of random effects. This is prohibitively slow when it must be repeated at every iteration of the Markov chain Monte Carlo (MCMC) procedure. This paper motivates a correction for ascertainment that is based on sampling random effects rather than integrating across them and can therefore be implemented in a general-purpose Gibbs sampling environment such as WinBUGS. The approach has the characteristic that it returns ascertainment-adjusted parameter estimates that pertain to the true distribution of determinants in the ascertained sample rather than in the general population. The implications of this characteristic are investigated and discussed. This paper extends the utility of Gibbs sampling-based GLMMs to a variety of settings in which family data are ascertained nonrandomly.  相似文献   

Linear mixed effects (LME) models are increasingly used for analyses of biological and biomedical data. When the multivariate normal assumption is not adequate for an LME model, then a robust estimation approach is preferable to the maximum likelihood one. M-estimators were considered before for robust estimation of the LME models, and recently a constrained S-estimator was proposed. This S-estimator cannot be applied directly to LME models with correlated error terms and vector random effects with correlated dimensions. Therefore, a modification is proposed, which extends application of the constrained S-estimator to the LME models for multivariate responses with correlated dimensions and to longitudinal data. Also, a new computational algorithm is developed for computing constrained S-estimators. Performance of the S-estimators based on the original Tukey's biweight and translated biweight is evaluated in a small simulation study with repeated multivariate responses with correlated dimensions. The proposed methodology is applied to jointly analyze repeated measures on three cholesterol components, HDL, LDL, and triglycerides.  相似文献   

The classic concordance correlation coefficient measures the agreement between two variables. In recent studies, concordance correlation coefficients have been generalized to deal with responses from a distribution from the exponential family using the univariate generalized linear mixed model. Multivariate data arise when responses on the same unit are measured repeatedly by several methods. The relationship among these responses is often of interest. In clustered mixed data, the correlation could be present between repeated measurements either within the same observer or between different methods on the same subjects. Indices for measuring such association are needed. This study proposes a series of indices, namely, intra‐correlation, inter‐correlation, and total correlation coefficients to measure the correlation under various circumstances in a multivariate generalized linear model, especially for joint modeling of clustered count and continuous outcomes. The proposed indices are natural extensions of the concordance correlation coefficient. We demonstrate the methodology with simulation studies. A case example of osteoarthritis study is provided to illustrate the use of these proposed indices. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

Multilevel and longitudinal studies are frequently subject to missing data. For example, biomarker studies for oral cancer may involve multiple assays for each participant. Assays may fail, resulting in missing data values that can be assumed to be missing completely at random. Catellier and Muller proposed a data analytic technique to account for data missing at random in multilevel and longitudinal studies. They suggested modifying the degrees of freedom for both the Hotelling–Lawley trace F statistic and its null case reference distribution. We propose parallel adjustments to approximate power for this multivariate test in studies with missing data. The power approximations use a modified non‐central F statistic, which is a function of (i) the expected number of complete cases, (ii) the expected number of non‐missing pairs of responses, or (iii) the trimmed sample size, which is the planned sample size reduced by the anticipated proportion of missing data. The accuracy of the method is assessed by comparing the theoretical results to the Monte Carlo simulated power for the Catellier and Muller multivariate test. Over all experimental conditions, the closest approximation to the empirical power of the Catellier and Muller multivariate test is obtained by adjusting power calculations with the expected number of complete cases. The utility of the method is demonstrated with a multivariate power analysis for a hypothetical oral cancer biomarkers study. We describe how to implement the method using standard, commercially available software products and give example code. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

In this article, we implement a practical computational method for various semiparametric mixed effects models, estimating nonlinear functions by penalized splines. We approximate the integration of the penalized likelihood with respect to random effects with the use of adaptive Gaussian quadrature, which we can conveniently implement in SAS procedure NLMIXED. We carry out the selection of smoothing parameters through approximated generalized cross‐validation scores. Our method has two advantages: (1) the estimation is more accurate than the current available quasi‐likelihood method for sparse data, for example, binary data; and (2) it can be used in fitting more sophisticated models. We show the performance of our approach in simulation studies with longitudinal outcomes from three settings: binary, normal data after Box–Cox transformation, and count data with log‐Gamma random effects. We also develop an estimation method for a longitudinal two‐part nonparametric random effects model and apply it to analyze repeated measures of semicontinuous daily drinking records in a randomized controlled trial of topiramate. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

This paper introduces an exploratory way to determine how variance relates to the mean in generalized linear models. This novel method employs the robust likelihood technique introduced by Royall and Tsou.A urinary data set collected by Ginsberg et al. and the fabric data set analysed by Lee and Nelder are considered to demonstrate the applicability and simplicity of the proposed technique. Application of the proposed method could easily reveal a mean-variance relationship that would generally be left unnoticed, or that would require more complex modelling to detect.  相似文献   

This paper is motivated from a retrospective study of the impact of vitamin D deficiency on the clinical outcomes for critically ill patients in multi‐center critical care units. The primary predictors of interest, vitamin D2 and D3 levels, are censored at a known detection limit. Within the context of generalized linear mixed models, we investigate statistical methods to handle multiple censored predictors in the presence of auxiliary variables. A Bayesian joint modeling approach is proposed to fit the complex heterogeneous multi‐center data, in which the data information is fully used to estimate parameters of interest. Efficient Monte Carlo Markov chain algorithms are specifically developed depending on the nature of the response. Simulation studies demonstrate the outperformance of the proposed Bayesian approach over other existing methods. An application to the data set from the vitamin D deficiency study is presented. Possible extensions of the method regarding the absence of auxiliary variables, semiparametric models, as well as the type of censoring are also discussed. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

Although change‐point analysis methods for longitudinal data have been developed, it is often of interest to detect multiple change points in longitudinal data. In this paper, we propose a linear mixed effects modeling framework for identifying multiple change points in longitudinal Gaussian data. Specifically, we develop a novel statistical and computational framework that integrates the expectation–maximization and the dynamic programming algorithms. We conduct a comprehensive simulation study to demonstrate the performance of our method. We illustrate our method with an analysis of data from a trial evaluating a behavioral intervention for the control of type I diabetes in adolescents with HbA1c as the longitudinal response variable. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

This paper extends the classical linear mixed model by considering a multivariate skew-normal assumption for the distribution of random effects. We present an efficient hybrid ECME-NR algorithm for the computation of maximum-likelihood estimates of parameters. A score test statistic for testing the existence of skewness preference among random effects is developed. The technique for the prediction of future responses under this model is also investigated. The methodology is illustrated through an application to Framingham cholesterol data and a simulation study.  相似文献   

We introduce a model to account for abrupt changes among repeated measures with non-monotone missingness. Development of likelihood inferences for such models is hard because it involves intractable integration to obtain the marginal likelihood. We use hierarchical likelihood to overcome such difficulty. Abrupt changes among repeated measures can be well described by introducing random effects in the dispersion. A simulation study shows that the resulting estimator is efficient, robust against misspecification of fatness of tails. For illustration we use a schizophrenic behaviour data presented by Rubin and Wu.  相似文献   

目的 利用广义相加模型和广义线性模型探讨糖尿病与相关因素的关系。 方法 2010-2012年在广西5市/县采用分层整群抽样的方法,选取18岁及以上常住居民作为研究对象。3 827名被调查者均接受问卷调查,测量身高、体重、血压、腰围(WC),检测空腹血糖(FPG)。 结果 糖尿病患病率为9.4%,男性和女性患病率分别为10.3%、8.8%,差异无统计学意义(χ2=2.629,P=0.105)。单因素结果显示年龄、城乡、民族、文化程度、婚姻状况、饮酒、肥胖类型(OBPH)7个因素与糖尿病有关(P<0.01)。多因素logistic回归分析结果显示,农村相对城市有降低患糖尿病的风险(OR=0.633,95%CI:0.499~0.802,P=0.000);60岁及以上人群与35岁以下人群相比患糖尿病风险高(OR=14.037,95%CI:6.538~30.134,P=0.000);中心型肥胖+超重、中心型肥胖+肥胖分别与正常体重比较,患糖尿病风险高(分别OR=2.259, 95%CI:1.705~2.994, P=0.000;OR=2.068, 95%CI:1.368~3.125, P=0.001)。广义相加模型和广义线性模型结果显示饮酒与糖尿病呈现J型非线性关系(χ2=7.712,P=0.019),饮酒<1次/周和≥6次/周增加患糖尿病的风险;肥胖类型与糖尿病呈现为平躺∽型曲线关系(χ2=13.547,P=0.008),中心肥胖和低体重患糖尿病的风险增加。 结论 广义相加模型和广义线性模型能直观呈现饮酒、肥胖类型与糖尿病的非线性关系。  相似文献   

Multilevel mixed effects survival models are used in the analysis of clustered survival data, such as repeated events, multicenter clinical trials, and individual participant data (IPD) meta‐analyses, to investigate heterogeneity in baseline risk and covariate effects. In this paper, we extend parametric frailty models including the exponential, Weibull and Gompertz proportional hazards (PH) models and the log logistic, log normal, and generalized gamma accelerated failure time models to allow any number of normally distributed random effects. Furthermore, we extend the flexible parametric survival model of Royston and Parmar, modeled on the log‐cumulative hazard scale using restricted cubic splines, to include random effects while also allowing for non‐PH (time‐dependent effects). Maximum likelihood is used to estimate the models utilizing adaptive or nonadaptive Gauss–Hermite quadrature. The methods are evaluated through simulation studies representing clinically plausible scenarios of a multicenter trial and IPD meta‐analysis, showing good performance of the estimation method. The flexible parametric mixed effects model is illustrated using a dataset of patients with kidney disease and repeated times to infection and an IPD meta‐analysis of prognostic factor studies in patients with breast cancer. User‐friendly Stata software is provided to implement the methods. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

A major, often unstated, concern of researchers carrying out epidemiological studies of medical therapy is the potential impact on validity if estimates of treatment are biased due to unmeasured confounders. One technique for obtaining consistent estimates of treatment effects in the presence of unmeasured confounders is instrumental variables analysis (IVA). This technique has been well developed in the econometrics literature and is being increasingly used in epidemiological studies. However, the approach to IVA that is most commonly used in such studies is based on linear models, while many epidemiological applications make use of non-linear models, specifically generalized linear models (GLMs) such as logistic or Poisson regression. Here we present a simple method for applying IVA within the class of GLMs using the generalized method of moments approach. We explore some of the theoretical properties of the method and illustrate its use within both a simulation example and an epidemiological study where unmeasured confounding is suspected to be present. We estimate the effects of beta-blocker therapy on one-year all-cause mortality after an incident hospitalization for heart failure, in the absence of data describing disease severity, which is believed to be a confounder.  相似文献   

