首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To describe the spatial distribution of diseases, a number of methods have been proposed to model relative risks within areas. Most models use Bayesian hierarchical methods, in which one models both spatially structured and unstructured extra‐Poisson variance present in the data. For modelling a single disease, the conditional autoregressive (CAR) convolution model has been very popular. More recently, a combined model was proposed that ‘combines’ ideas from the CAR convolution model and the well‐known Poisson‐gamma model. The combined model was shown to be a good alternative to the CAR convolution model when there was a large amount of uncorrelated extra‐variance in the data. Less solutions exist for modelling two diseases simultaneously or modelling a disease in two sub‐populations simultaneously. Furthermore, existing models are typically based on the CAR convolution model. In this paper, a bivariate version of the combined model is proposed in which the unstructured heterogeneity term is split up into terms that are shared and terms that are specific to the disease or subpopulation, while spatial dependency is introduced via a univariate or multivariate Markov random field. The proposed method is illustrated by analysis of disease data in Georgia (USA) and Limburg (Belgium) and in a simulation study. We conclude that the bivariate combined model constitutes an interesting model when two diseases are possibly correlated. As the choice of the preferred model differs between data sets, we suggest to use the new and existing modelling approaches together and to choose the best model via goodness‐of‐fit statistics. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

2.
Disease mapping is the area of epidemiology that estimates the spatial pattern in disease risk over an extended geographical region, so that areas with elevated risk levels can be identified. Bayesian hierarchical models are typically used in this context, which represent the risk surface using a combination of available covariate data and a set of spatial random effects. These random effects are included to model any overdispersion or spatial correlation in the disease data, that has not been accounted for by the available covariate information. The random effects are typically modelled by a conditional autoregressive (CAR) prior distribution, and a number of alternative specifications have been proposed. This paper critiques four of the most common models within the CAR class, and assesses their appropriateness via a simulation study. The four models are then applied to a new study mapping cancer incidence in Greater Glasgow, Scotland, between 2001 and 2005.  相似文献   

3.

Background

In medical and biomedical areas, binary and binomial outcomes are very common. Such data are often collected longitudinally from a given subject repeatedly overtime, which result in clustering of the observations within subjects, leading to correlation, on the one hand. The repeated binary outcomes from a given subject, on the other hand, constitute a binomial outcome, where the prescribed mean-variance relationship is often violated, leading to the so-called overdispersion.

Methods

Two longitudinal binary data sets, collected in south western Ethiopia: the Jimma infant growth study, where the child’s early growth is studied, and the Jimma longitudinal family survey of youth where the adolescent’s school attendance is studied over time, are considered. A new model which combines both overdispersion, and correlation simultaneously, also known as the combined model is applied. In addition, the commonly used methods for binary and binomial data, such as the simple logistic, which accounts neither for the overdispersion nor the correlation, the beta-binomial model, and the logistic-normal model, which accommodate only for the overdispersion, and correlation, respectively, are also considered for comparison purpose. As an alternative estimation technique, a Bayesian implementation of the combined model is also presented.

Results

The combined model results in model improvement in fit, and hence the preferred one, based on likelihood comparison, and DIC criterion. Further, the two estimation approaches result in fairly similar parameter estimates and inferences in both of our case studies. Early initiation of breastfeeding has a protective effect against the risk of overweight in late infancy (p = 0.001), while proportion of overweight seems to be invariant among males and females overtime (p = 0.66). Gender is significantly associated with school attendance, where girls have a lower rate of attendance (p = 0.001) as compared to boys.

Conclusion

We applied a flexible modeling framework to analyze binary and binomial longitudinal data. Instead of accounting for overdispersion, and correlation separately, both can be accommodated simultaneously, by allowing two separate sets of the beta, and the normal random effects at once.  相似文献   

4.
《Annals of epidemiology》2017,27(1):59-66.e3
PurposeTo investigate the distribution of mesothelioma in Flanders using Bayesian disease mapping models that account for both an excess of zeros and overdispersion.MethodsThe numbers of newly diagnosed mesothelioma cases within all Flemish municipalities between 1999 and 2008 were obtained from the Belgian Cancer Registry. To deal with overdispersion, zero inflation, and geographical association, the hurdle combined model was proposed, which has three components: a Bernoulli zero-inflation mixture component to account for excess zeros, a gamma random effect to adjust for overdispersion, and a normal conditional autoregressive random effect to attribute spatial association. This model was compared with other existing methods in literature.ResultsThe results indicate that hurdle models with a random effects term accounting for extra variance in the Bernoulli zero-inflation component fit the data better than hurdle models that do not take overdispersion in the occurrence of zeros into account. Furthermore, traditional models that do not take into account excessive zeros but contain at least one random effects term that models extra variance in the counts have better fits compared to their hurdle counterparts. In other words, the extra variability, due to an excess of zeros, can be accommodated by spatially structured and/or unstructured random effects in a Poisson model such that the hurdle mixture model is not necessary.ConclusionsModels taking into account zero inflation do not always provide better fits to data with excessive zeros than less complex models. In this study, a simple conditional autoregressive model identified a cluster in mesothelioma cases near a former asbestos processing plant (Kapelle-op-den-Bos). This observation is likely linked with historical local asbestos exposures. Future research will clarify this.  相似文献   

5.
Poisson data frequently exhibit overdispersion; and, for univariate models, many options exist to circumvent this problem. Nonetheless, in complex scenarios, for example, in longitudinal studies, accounting for overdispersion is a more challenging task. Recently, Molenberghs et.al, presented a model that accounts for overdispersion by combining two sets of random effects. However, introducing a new set of random effects implies additional distributional assumptions for intrinsically unobservable variables, which has not been considered before. Using the combined model as a framework, we explored the impact of ignoring overdispersion in complex longitudinal settings via simulations. Furthermore, we evaluated the effect of misspecifying the random-effects distribution on both the combined model and the classical Poisson hierarchical model. Our results indicate that even though inferences may be affected by ignored overdispersion, the combined model is a promising tool in this scenario.  相似文献   

6.
A Bayesian semi-parametric model is proposed to capture the interaction among demographic effects (age and gender), spatial effects (county) and temporal effects of colorectal cancer incidences simultaneously. In particular, an extension of multivariate conditionally autoregressive (CAR) processes to a partially informative Gaussian demographic spatial temporal CAR (DSTCAR) process for a spatial-temporal setting is proposed. The precision matrix of the Gaussian DSTCAR process is the Kronecker product of several components. The spatial component is modelled with a CAR prior. A pth order intrinsic autoregressive prior (IAR(p)) is implemented for the temporal component to estimate a smoothed and non-parametric temporal trend. The demographic component is modelled with a Wishart prior. Data analysis shows significant spatial correlation only exists in the age group of 50-59. Males and females in their 50s and 60s show fairly strong correlation. The hypothesis testing based on Bayes factor suggests that gender correlation cannot be ignored in this model.  相似文献   

7.
目的 探讨术前C-反应蛋白/血清白蛋白(CRP/Alb,CAR)、术前术后CAR值改变(ΔCAR)与非小细胞肺癌(non-small cell lung cancer,NSCLC)临床病理特征的相关性及其在生存预后中的诊断价值。方法 回顾性分析某医院100例NSCLC患者的术前术后的CRP、Alb值,计算CAR及ΔCAR值,卡方检验分析CAR值、ΔCAR值与NSCLC患者临床病理特征的相关性,Kaplan-Meier法及COX回归模型分析CAR、ΔCAR值与NSCLC预后的关系。结果 术前CAR值高水平与年龄、淋巴结转移、临床分期、术前癌胚抗原(CEA)水平、糖类抗原211(CA211)及鳞癌抗原(SCC)高表达呈正相关(P<0.05)。ΔCAR高水平与年龄、淋巴结转移、临床分期(、术前CEA、CA211及SCC高表达呈负相关(P<0.05)。生存分析结果发现:CAR值高水平患者预后更差(P<0.01)。ΔCAR值高水平患者预后更好(P<0.01)。多因素COX回归模型发现,ΔCAR值高低可作为判定预后的独立因素(风险比Hazard Ratio,HR=0.595,P<0.05)。结论 术前CAR和ΔCAR联合检测具有高效、灵敏、便捷、经济及创伤小等优点。ΔCAR对判定NSCLC的临床分期及生存预后具有重要的诊断价值。  相似文献   

8.
This study focuses on sample size determination in repeated measures studies with multinomial outcomes from multiple factors. In settings where multiple factors have repeated measures, a single subject could have hundreds of observations. Sample size selection may then refer to the number of subjects, the number of levels within a factor, or the number of repetitions within the level. We simulate multinomial data through a generalized linear mixed model (GLMM) with and without overdispersion, compute the empirical power of detecting group difference for several analytical methods and contrast their performance in group comparison studies with repeated multinomial data. We use four spatial functions to model the spatial correlation structures among observations. We evaluate the factors affecting the power under various scenarios. We also present a dataset typical in hearing studies for sound localization, in which a spatially distributed array of audio loudspeakers plays multiple sounds in order to compare two programming schemes for a hearing aid device.  相似文献   

9.
The spatial scan statistic has been widely used in spatial disease surveillance and spatial cluster detection for more than a decade. However, overdispersion often presents in real-world data, causing not only violation of the Poisson assumption but also excessive type I errors or false alarms. In order to account for overdispersion, we extend the Poisson-based spatial scan test to a quasi-Poisson-based test. The simulation shows that the proposed method can substantially reduce type I error probabilities in the presence of overdispersion. In a case study of infant mortality in Jiangxi, China, both tests detect a cluster; however, a secondary cluster is identified by only the Poisson-based test. It is recommended that a cluster detected by the Poisson-based scan test should be interpreted with caution when it is not confirmed by the quasi-Poisson-based test.  相似文献   

10.
The statistical analysis of spatially correlated data has become an important scientific research topic lately. The analysis of the mortality or morbidity rates observed at different areas may help to decide if people living in certain locations are considered at higher risk than others. Once the statistical model for the data of interest has been chosen, further effort can be devoted to identifying the areas under higher risks. Many scientists, including statisticians, have tried the conditional autoregressive (CAR) model to describe the spatial autocorrelation among the observed data. This model has greater smoothing effect than the exchangeable models, such as the Poisson gamma model for spatial data. This paper focuses on comparing the two types of models using the index LG, the ratio of local to global variability. Two applications, Taiwan asthma mortality and Scotland lip cancer, are considered and the use of LG is illustrated. The estimated values for both data sets are small, implying a Poisson gamma model may be favoured over the CAR model. We discuss the implications for the two applications respectively. To evaluate the performance of the index LG, we also compute the Bayes factor, a Bayesian model selection criterion, to see which model is preferred for the two applications and simulation data. To derive the value of LG, we estimate its posterior mode based on samples derived from the BUGS program, while for Bayes factor we use the double Laplace-Metropolis method, Schwarz criterion, and a modified harmonic mean for approximations. The results of LG and Bayes factor are consistent. We conclude that LG is fairly accurate as an index for selection between Poisson gamma and CAR model. When easy and fast computation is of concern, we recommend using LG as the first and less costly index.  相似文献   

11.

Objective

To propose a more realistic model for disease cluster detection, through a modification of the spatial scan statistic to account simultaneously for inflated zeros and overdispersion.

Introduction

Spatial Scan Statistics [1] usually assume Poisson or Binomial distributed data, which is not adequate in many disease surveillance scenarios. For example, small areas distant from hospitals may exhibit a smaller number of cases than expected in those simple models. Also, underreporting may occur in underdeveloped regions, due to inefficient data collection or the difficulty to access remote sites. Those factors generate excess zero case counts or overdispersion, inducing a violation of the statistical model and also increasing the type I error (false alarms). Overdispersion occurs when data variance is greater than the predicted by the used model. To accommodate it, an extra parameter must be included; in the Poisson model, one makes the variance equal to the mean.

Methods

Tools like the Generalized Poisson (GP) and the Double Poisson [2] may be a better option for this kind of problem, modeling separately the mean and variance, which could be easily adjusted by covariates. When excess zeros occur, the Zero Inflated Poisson (ZIP) model is used, although ZIP’s estimated parameters may be severely biased if nonzero counts are too dispersed, compared to the Poisson distribution. In this case the Inflated Zero models for the Generalized Poisson (ZIGP), Double Poisson (ZIDP) and Negative Binomial (ZINB) could be good alternatives to the joint modeling of excess zeros and overdispersion. By one hand, Zero Inflated Poisson (ZIP) models were proposed using the spatial scan statistic to deal with the excess zeros [3]. By the other hand, another spatial scan statistic was based on a Poisson-Gamma mixture model for overdispersion [4]. In this work we present a model which includes inflated zeros and overdispersion simultaneously, based on the ZIDP model. Let the parameter p indicate the zero inflation. As the the remaining parameters of the observed cases map and the parameter p are not independent, the likelihood maximization process is not straightforward; it becomes even more complicated when we include covariates in the analysis. To solve this problem we introduce a vector of latent variables in order to factorize the likelihood, and obtain a facilitator for the maximization process using the E-M (Expectation-Maximization) algorithm. We derive the formulas to maximize iteratively the likelihood, and implement a computer program using the E-M algorithm to estimate the parameters under null and alternative hypothesis. The p-value is obtained via the Fast Double Bootstrap Test [5].

Results

Numerical simulations are conducted to assess the effectiveness of the method. We present results for Hanseniasis surveillance in the Brazilian Amazon in 2010 using this technique. We obtain the most likely spatial clusters for the Poisson, ZIP, Poisson-Gamma mixture and ZIDP models and compare the results.

Conclusions

The Zero Inflated Double Poisson Spatial Scan Statistic for disease cluster detection incorporates the flexibility of previous models, accounting for inflated zeros and overdispersion simultaneously.The Hanseniasis study case map, due to excess of zero cases counts in many municipalities of the Brazilian Amazon and the presence of overdispersion, was a good benchmark to test the ZIDP model. The results obtained are easier to understand compared to each of the previous spatial scan statistic models, the Zero Inflated Poisson (ZIP) model and the Poisson-Gamma mixture model for overdispersion, taken separetely. The E-M algorithm and the Fast Double Bootstrap test are computationally efficient for this type of problem.  相似文献   

12.
MacNab YC  Dean CB 《Statistics in medicine》2000,19(17-18):2421-2435
This paper discusses a variety of conditional autoregressive (CAR) models for mapping disease rates, beyond the usual first-order intrinsic CAR model. We illustrate the utility and scope of such models for handling different types of data structures. To encourage their routine use for map production at statistical and health agencies, a simple algorithm for fitting such models is presented. This is derived from penalized quasi-likelihood (PQL) inference which uses an analogue of best-linear unbiased estimation for the regional risk ratios and restricted maximum likelihood for the variance components. We offer the practitioner here the use of the parametric bootstrap for inference. It is more reliable than standard maximum likelihood asymptotics for inference purposes since relevant hypotheses for the mapping of rates lie on the boundary of the parameter space. We illustrate the parametric bootstrap test of the practically relevant and important simplifying hypothesis that there is no spatial autocorrelation. Although the parametric bootstrap requires computational effort, it is straightforward to implement and offers a wealth of information relating to the estimators and their properties. The proposed methodology is illustrated by analysing infant mortality in the province of British Columbia in Canada.  相似文献   

13.
We present a case study using the negative binomial regression model for discrete outcome data arising from a clinical trial designed to evaluate the effectiveness of a prehabilitation program in preventing functional decline among physically frail, community-living older persons. The primary outcome was a measure of disability at 7 months that had a range from 0 to 16 with a mean of 2.8 (variance of 16.4) and a median of 1. The data were right skewed with clumping at zero (i.e., 40% of subjects had no disability at 7 months). Because the variance was nearly 6 times greater than the mean, the negative binomial model provided an improved fit to the data and accounted better for overdispersion than the Poisson regression model, which assumes that the mean and variance are the same. Although correcting the variance and corresponding test statistics for overdispersion is a standard procedure in the Poisson model, the estimates of the regression parameters are inefficient because they have more sampling variability than is necessary. The negative binomial model provides an alternative approach for the analysis of discrete data where overdispersion is a problem, provided that the model is correctly specified and adequately fits the data.  相似文献   

14.
Multivariate count data are common in many disciplines. The variables in such data often exhibit complex positive or negative dependency structures. We propose three Bayesian approaches to modeling bivariate count data by simultaneously considering covariate-dependent means and correlation. A direct approach utilizes a bivariate negative binomial probability mass function developed in Famoye (2010, Journal of Applied Statistics). The second approach fits bivariate count data indirectly using a bivariate Poisson-gamma mixture model. The third approach is a bivariate Gaussian copula model. Based on the results from simulation analyses, the indirect and copula approaches perform better overall than the direct approach in terms of model fitting and identifying covariate-dependent association. The proposed approaches are applied to two RNA-sequencing data sets for studying breast cancer and melanoma (BRCA-US and SKCM-US), respectively, obtained through the International Cancer Genome Consortium.  相似文献   

15.
Count data are collected repeatedly over time in many applications, such as biology, epidemiology, and public health. Such data are often characterized by the following three features. First, correlation due to the repeated measures is usually accounted for using subject‐specific random effects, which are assumed to be normally distributed. Second, the sample variance may exceed the mean, and hence, the theoretical mean–variance relationship is violated, leading to overdispersion. This is usually allowed for based on a hierarchical approach, combining a Poisson model with gamma distributed random effects. Third, an excess of zeros beyond what standard count distributions can predict is often handled by either the hurdle or the zero‐inflated model. A zero‐inflated model assumes two processes as sources of zeros and combines a count distribution with a discrete point mass as a mixture, while the hurdle model separately handles zero observations and positive counts, where then a truncated‐at‐zero count distribution is used for the non‐zero state. In practice, however, all these three features can appear simultaneously. Hence, a modeling framework that incorporates all three is necessary, and this presents challenges for the data analysis. Such models, when conditionally specified, will naturally have a subject‐specific interpretation. However, adopting their purposefully modified marginalized versions leads to a direct marginal or population‐averaged interpretation for parameter estimates of covariate effects, which is the primary interest in many applications. In this paper, we present a marginalized hurdle model and a marginalized zero‐inflated model for correlated and overdispersed count data with excess zero observations and then illustrate these further with two case studies. The first dataset focuses on the Anopheles mosquito density around a hydroelectric dam, while adolescents’ involvement in work, to earn money and support their families or themselves, is studied in the second example. Sub‐models, which result from omitting zero‐inflation and/or overdispersion features, are also considered for comparison's purpose. Analysis of the two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
BackgroundResearchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations.MethodsThree methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (χ2/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model.ResultsVariance of the number of primary care consultations (Var[Y] = 21.03) was greater than the mean (E[Y] = 5.93) and the χ2/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively.ConclusionDifferent methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.  相似文献   

17.
In recent days, different types of surveillance data are becoming available for public health purposes. In most cases, several variables are monitored and events of different types are reported. As the amount of surveillance data increases, statistical methods that can effectively address multivariate surveillance scenarios are demanded. Even though research activity in this field is increasing rapidly in recent years, only a few approaches have simultaneously addressed the integer-valued property of the data and its correlation (both time correlation and cross-correlation) structure. In this article, we suggest a multivariate integer-valued autoregressive model that allows for both serial and cross-correlations between the series and can easily accommodate overdispersion and covariate information. Moreover, its structure implies a natural decomposition into an endemic and an epidemic component, a common distinction in dynamic models for infectious disease counts. Detection of disease outbreaks is achieved through the comparison of surveillance data with one-step-ahead predictions obtained after fitting the suggested model to a set of clean historical data. The performance of the suggested model is illustrated on a trivariate series of syndromic surveillance data collected during Athens 2004 Olympic Games.  相似文献   

18.
Semkow TM 《Health physics》2002,83(4):485-496
New evidence is provided suggesting that radioassay data are frequently overdispersed with respect to the Poisson distribution. Twelve cases of radioassay data were measured using commonly available detection systems. The data were analyzed using a limited version of the overdispersion model developed earlier. In that limit, the relationships between three overdispersed distributions were derived and discussed: beta-Poisson, negative binomial, and overdispersed Gaussian. Out of a total of 13 cases studied (12 measured plus one from the literature), 4 were consistent with the Poisson statistics at 90% confidence level while the remaining 9 were found overdispersed. This shows that the overdispersion is rather prevalent in radioassay. All three overdispersed distributions fitted the data very well. The overdispersion was attributed mostly to the excess fluctuations of the detection systems or, in 2 cases, sequential radioactive decay.  相似文献   

19.
Previous studies have suggested a link between alcohol outlets and assaults. In this paper, we explore the effects of alcohol availability on assaults at the census tract level over time. In addition, we use a natural experiment to check whether a sudden loss of alcohol outlets is associated with deeper decreasing in assault violence. Several features of the data raise statistical challenges: (1) the association between covariates (for example, the alcohol outlet density of each census tract) and the assault rates may be complex and therefore cannot be described using a linear model without covariates transformation, (2) the covariates may be highly correlated with each other, (3) there are a number of observations that have missing inputs, and (4) there is spatial association in assault rates at the census tract level. We propose a hierarchical additive model, where the nonlinear correlations and the complex interaction effects are modeled using the multiple additive regression trees and the residual spatial association in the assault rates that cannot be explained in the model are smoothed using a conditional autoregressive (CAR) method. We develop a two‐stage algorithm that connects the nonparametric trees with CAR to look for important covariates associated with the assault rates, while taking into account the spatial association of assault rates in adjacent census tracts. The proposed method is applied to the Los Angeles assault data (1990–1999). To assess the efficiency of the method, the results are compared with those obtained from a hierarchical linear model. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

20.
OBJECTIVE: To investigate the association between homicide rates and socio-economic variables taking into account the spatial site of the indicators. METHODS: An ecological study was conducted. The dependent variable was the rate of homicides among the male population aged 15 to 49 years, residing in the districts of the State of Pernambuco from 1995 to 1998. The independent variables were an index of the living conditions, per capita family income, Theil inequality index, Gini index, average income of the head of the family, poverty index, rate of illiteracy, and demographic density. The following techniques were used in the analysis: a spatial autocorrelation test determined by the Moran index, multiple linear regression, a spatial regression model (CAR) and a generalized additive model for the detection of spatial trend (LOESS). RESULTS: The illiteracy and the poverty index explained 24.6% of the total variability of the homicide rates and there was an inverse relationship. Moran's I statistics indicated spatial autocorrelation between municipalities. The multiple linear regression model best fitted for the purposes of this study was the Conditional Auto Regressive (CAR) model. The latter confirmed the association between the poverty index, illiteracy and homicide rates. CONCLUSIONS: The inverse association observed between socio-economic indicators and homicides may be expressing a process that propitiates improvement in living conditions and that is linked predominantly to conditions that generate violence, such as drug traffic.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号