首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Zero‐inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero‐inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non‐risk group in the population, the ZIP (ZINB) models a two‐component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at‐risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution‐free alternative and compare its performance with these popular parametric models as well as a moment‐based approach proposed by Yu et al. [Statistics in Medicine 2013; 32 : 2390–2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero‐inflated responses. We illustrate our approach with both simulated and real study data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
A number of mixture modeling approaches assume both normality and independent observations. However, these two assumptions are at odds with the reality of many data sets, which are often characterized by an abundance of zero‐valued or highly skewed observations as well as observations from biologically related (i.e., non‐independent) subjects. We present here a finite mixture model with a zero‐inflated Poisson regression component that may be applied to both types of data. This flexible approach allows the use of covariates to model both the Poisson mean and rate of zero inflation and can incorporate random effects to accommodate non‐independent observations. We demonstrate the utility of this approach by applying these models to a candidate endophenotype for schizophrenia, but the same methods are applicable to other types of data characterized by zero inflation and non‐independence. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
In practice, count data may exhibit varying dispersion patterns and excessive zero values; additionally, they may appear in groups or clusters sharing a common source of variation. We present a novel Bayesian approach for analyzing such data. To model these features, we combine the Conway‐Maxwell‐Poisson distribution, which allows both overdispersion and underdispersion, with a hurdle component for the zeros and random effects for clustering. We propose an efficient Markov chain Monte Carlo sampling scheme to obtain posterior inference from our model. Through simulation studies, we compare our hurdle Conway‐Maxwell‐Poisson model with a hurdle Poisson model to demonstrate the effectiveness of our Conway‐Maxwell‐Poisson approach. Furthermore, we apply our model to analyze an illustrative dataset containing information on the number and types of carious lesions on each tooth in a population of 9‐year‐olds from the Iowa Fluoride Study, which is an ongoing longitudinal study on a cohort of Iowa children that began in 1991.  相似文献   

4.
Applications of zero‐inflated count data models have proliferated in health economics. However, zero‐inflated Poisson or zero‐inflated negative binomial maximum likelihood estimators are not robust to misspecification. This article proposes Poisson quasi‐likelihood estimators as an alternative. These estimators are consistent in the presence of excess zeros without having to specify the full distribution. The advantages of the Poisson quasi‐likelihood approach are illustrated in a series of Monte Carlo simulations and in an application to the demand for health services. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
Count data are collected repeatedly over time in many applications, such as biology, epidemiology, and public health. Such data are often characterized by the following three features. First, correlation due to the repeated measures is usually accounted for using subject‐specific random effects, which are assumed to be normally distributed. Second, the sample variance may exceed the mean, and hence, the theoretical mean–variance relationship is violated, leading to overdispersion. This is usually allowed for based on a hierarchical approach, combining a Poisson model with gamma distributed random effects. Third, an excess of zeros beyond what standard count distributions can predict is often handled by either the hurdle or the zero‐inflated model. A zero‐inflated model assumes two processes as sources of zeros and combines a count distribution with a discrete point mass as a mixture, while the hurdle model separately handles zero observations and positive counts, where then a truncated‐at‐zero count distribution is used for the non‐zero state. In practice, however, all these three features can appear simultaneously. Hence, a modeling framework that incorporates all three is necessary, and this presents challenges for the data analysis. Such models, when conditionally specified, will naturally have a subject‐specific interpretation. However, adopting their purposefully modified marginalized versions leads to a direct marginal or population‐averaged interpretation for parameter estimates of covariate effects, which is the primary interest in many applications. In this paper, we present a marginalized hurdle model and a marginalized zero‐inflated model for correlated and overdispersed count data with excess zero observations and then illustrate these further with two case studies. The first dataset focuses on the Anopheles mosquito density around a hydroelectric dam, while adolescents’ involvement in work, to earn money and support their families or themselves, is studied in the second example. Sub‐models, which result from omitting zero‐inflation and/or overdispersion features, are also considered for comparison's purpose. Analysis of the two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

6.
Zero‐inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero‐inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero‐inflated and K‐inflated models exhibit a better fit than the zero‐inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
We propose functional linear models for zero‐inflated count data with a focus on the functional hurdle and functional zero‐inflated Poisson (ZIP) models. Although the hurdle model assumes the counts come from a mixture of a degenerate distribution at zero and a zero‐truncated Poisson distribution, the ZIP model considers a mixture of a degenerate distribution at zero and a standard Poisson distribution. We extend the generalized functional linear model framework with a functional predictor and multiple cross‐sectional predictors to model counts generated by a mixture distribution. We propose an estimation procedure for functional hurdle and ZIP models, called penalized reconstruction, geared towards error‐prone and sparsely observed longitudinal functional predictors. The approach relies on dimension reduction and pooling of information across subjects involving basis expansions and penalized maximum likelihood techniques. The developed functional hurdle model is applied to modeling hospitalizations within the first 2 years from initiation of dialysis, with a high percentage of zeros, in the Comprehensive Dialysis Study participants. Hospitalization counts are modeled as a function of sparse longitudinal measurements of serum albumin concentrations, patient demographics, and comorbidities. Simulation studies are used to study finite sample properties of the proposed method and include comparisons with an adaptation of standard principal components regression. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
In medical and health studies, heterogeneities in clustered count data have been traditionally modeled by positive random effects in Poisson mixed models; however, excessive zeros often occur in clustered medical and health count data. In this paper, we consider a three‐level random effects zero‐inflated Poisson model for health‐care utilization data where data are clustered by both subjects and families. To accommodate zero and positive components in the count response compatibly, we model the subject level random effects by a compound Poisson distribution. Our model displays a variance components decomposition which clearly reflects the hierarchical structure of clustered data. A quasi‐likelihood approach has been developed in the estimation of our model. We illustrate the method with analysis of the health‐care utilization data. The performance of our method is also evaluated through simulation studies. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
The zero‐inflated Poisson (ZIP) regression model is often employed in public health research to examine the relationships between exposures of interest and a count outcome exhibiting many zeros, in excess of the amount expected under sampling from a Poisson distribution. The regression coefficients of the ZIP model have latent class interpretations, which correspond to a susceptible subpopulation at risk for the condition with counts generated from a Poisson distribution and a non‐susceptible subpopulation that provides the extra or excess zeros. The ZIP model parameters, however, are not well suited for inference targeted at marginal means, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. We develop a marginalized ZIP model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log‐incidence density ratios. Through simulation studies, the performance of maximum likelihood estimation of the marginalized ZIP model is assessed and compared with other methods of estimating overall exposure effects. The marginalized ZIP model is applied to a recent study of a motivational interviewing‐based safer sex counseling intervention, designed to reduce unprotected sexual act counts. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
This paper proposes a new statistical approach for predicting postoperative morbidity such as intensive care unit length of stay and number of complications after cardiac surgery in children. In a recent multi‐center study sponsored by the National Institutes of Health, 311 children undergoing cardiac surgery were enrolled. Morbidity data are count data in which the observations take only nonnegative integer values. Often, the number of zeros in the sample cannot be accommodated properly by a simple model, thus requiring a more complex model such as the zero‐inflated Poisson regression model. We are interested in identifying important risk factors for postoperative morbidity among many candidate predictors. There is only limited methodological work on variable selection for the zero‐inflated regression models. In this paper, we consider regularized zero‐inflated Poisson models through penalized likelihood function and develop a new expectation–maximization algorithm for numerical optimization. Simulation studies show that the proposed method has better performance than some competing methods. Using the proposed methods, we analyzed the postoperative morbidity, which improved the model fitting and identified important clinical and biomarker risk factors. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
Count data often arise in biomedical studies, while there could be a special feature with excessive zeros in the observed counts. The zero‐inflated Poisson model provides a natural approach to accounting for the excessive zero counts. In the semiparametric framework, we propose a generalized partially linear single‐index model for the mean of the Poisson component, the probability of zero, or both. We develop the estimation and inference procedure via a profile maximum likelihood method. Under some mild conditions, we establish the asymptotic properties of the profile likelihood estimators. The finite sample performance of the proposed method is demonstrated by simulation studies, and the new model is illustrated with a medical care dataset. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
Health services data often contain a high proportion of zeros. In studies examining patient hospitalization rates, for instance, many patients will have no hospitalizations, resulting in a count of zero. When the number of zeros is greater or less than expected under a standard count model, the data are said to be zero modified relative to the standard model. A similar phenomenon arises with semicontinuous data, which are characterized by a spike at zero followed by a continuous distribution with positive support. When analyzing zero‐modified count and semicontinuous data, flexible mixture distributions are often needed to accommodate both the excess zeros and the typically skewed distribution of nonzero values. Various models have been introduced over the past three decades to accommodate such data, including hurdle models, zero‐inflated models, and two‐part semicontinuous models. This tutorial describes recent modeling strategies for zero‐modified count and semicontinuous data and highlights their role in health services research studies. Part 1 of the tutorial, presented here, provides a general overview of the topic. Part 2, appearing as a companion piece in this issue of Statistics in Medicine, discusses three case studies illustrating applications of the methods to health services research. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

13.
Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time‐varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so, we utilize a general class of stochastic regression models appropriate for spatio‐temporal count data with an excess number of zeros. The developed regression framework does incorporate serial correlation and time‐varying covariates through an Ornstein–Uhlenbeck process formulation. In addition, we explore the effect of different priors, including default options and variations of mixtures of g‐priors. The effect of different distance kernels for the epidemic model component is investigated. We proceed by developing branching process‐based methods for testing scenarios for disease control, thus linking traditional epidemiological models with stochastic epidemic processes, useful in policy‐focused decision making. The approach is illustrated with an application to a sheep pox dataset from the Evros region, Greece. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

14.
In various medical related researches, excessive zeros, which make the standard Poisson regression model inadequate, often exist in count data. We proposed a covariate‐dependent random effect model to accommodate the excess zeros and the heterogeneity in the population simultaneously. This work is motivated by a data set from a survey on the dental health status of Hong Kong preschool children where the response variable is the number of decayed, missing, or filled teeth. The random effect has a sound biological interpretation as the overall oral health status or other personal qualities of an individual child that is unobserved and unable to be quantified easily. The overall measure of oral health status, responsible for accommodating the excessive zeros and also the heterogeneity among the children, is covariate dependent. This covariate‐dependent random effect model allows one to distinguish whether a potential covariate has an effect on the conceived overall oral health condition of the children, that is, the random effect, or has a direct effect on the magnitude of the counts, or both. We proposed a multiple imputation approach for estimation of the parameters. We discussed the choice of the imputation size. We evaluated the performance of the proposed estimation method through simulation studies, and we applied the model and method to the dental data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

15.
In the fight against hard‐to‐treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then, we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the generalized unbiased interaction detection and estimation (GUIDE) approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi‐squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by‐products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.

Objective

To propose a more realistic model for disease cluster detection, through a modification of the spatial scan statistic to account simultaneously for inflated zeros and overdispersion.

Introduction

Spatial Scan Statistics [1] usually assume Poisson or Binomial distributed data, which is not adequate in many disease surveillance scenarios. For example, small areas distant from hospitals may exhibit a smaller number of cases than expected in those simple models. Also, underreporting may occur in underdeveloped regions, due to inefficient data collection or the difficulty to access remote sites. Those factors generate excess zero case counts or overdispersion, inducing a violation of the statistical model and also increasing the type I error (false alarms). Overdispersion occurs when data variance is greater than the predicted by the used model. To accommodate it, an extra parameter must be included; in the Poisson model, one makes the variance equal to the mean.

Methods

Tools like the Generalized Poisson (GP) and the Double Poisson [2] may be a better option for this kind of problem, modeling separately the mean and variance, which could be easily adjusted by covariates. When excess zeros occur, the Zero Inflated Poisson (ZIP) model is used, although ZIP’s estimated parameters may be severely biased if nonzero counts are too dispersed, compared to the Poisson distribution. In this case the Inflated Zero models for the Generalized Poisson (ZIGP), Double Poisson (ZIDP) and Negative Binomial (ZINB) could be good alternatives to the joint modeling of excess zeros and overdispersion. By one hand, Zero Inflated Poisson (ZIP) models were proposed using the spatial scan statistic to deal with the excess zeros [3]. By the other hand, another spatial scan statistic was based on a Poisson-Gamma mixture model for overdispersion [4]. In this work we present a model which includes inflated zeros and overdispersion simultaneously, based on the ZIDP model. Let the parameter p indicate the zero inflation. As the the remaining parameters of the observed cases map and the parameter p are not independent, the likelihood maximization process is not straightforward; it becomes even more complicated when we include covariates in the analysis. To solve this problem we introduce a vector of latent variables in order to factorize the likelihood, and obtain a facilitator for the maximization process using the E-M (Expectation-Maximization) algorithm. We derive the formulas to maximize iteratively the likelihood, and implement a computer program using the E-M algorithm to estimate the parameters under null and alternative hypothesis. The p-value is obtained via the Fast Double Bootstrap Test [5].

Results

Numerical simulations are conducted to assess the effectiveness of the method. We present results for Hanseniasis surveillance in the Brazilian Amazon in 2010 using this technique. We obtain the most likely spatial clusters for the Poisson, ZIP, Poisson-Gamma mixture and ZIDP models and compare the results.

Conclusions

The Zero Inflated Double Poisson Spatial Scan Statistic for disease cluster detection incorporates the flexibility of previous models, accounting for inflated zeros and overdispersion simultaneously.The Hanseniasis study case map, due to excess of zero cases counts in many municipalities of the Brazilian Amazon and the presence of overdispersion, was a good benchmark to test the ZIDP model. The results obtained are easier to understand compared to each of the previous spatial scan statistic models, the Zero Inflated Poisson (ZIP) model and the Poisson-Gamma mixture model for overdispersion, taken separetely. The E-M algorithm and the Fast Double Bootstrap test are computationally efficient for this type of problem.  相似文献   

17.
Much attention has been paid to estimating the causal effect of adherence to a randomized protocol using instrumental variables to adjust for unmeasured confounding. Researchers tend to use the instrumental variable within one of the three main frameworks: regression with an endogenous variable, principal stratification, or structural‐nested modeling. We found in our literature review that even in simple settings, causal interpretations of analyses with endogenous regressors can be ambiguous or rely on a strong assumption that can be difficult to interpret. Principal stratification and structural‐nested modeling are alternative frameworks that render unambiguous causal interpretations based on assumptions that are, arguably, easier to interpret. Our interest stems from a wish to estimate the effect of cluster‐level adherence on individual‐level binary outcomes with a three‐armed cluster‐randomized trial and polytomous adherence. Principal stratification approaches to this problem are quite challenging because of the sheer number of principal strata involved. Therefore, we developed a structural‐nested modeling approach and, in the process, extended the methodology to accommodate cluster‐randomized trials with unequal probability of selecting individuals. Furthermore, we developed a method to implement the approach with relatively simple programming. The approach works quite well, but when the structural‐nested model does not fit the data, there is no solution to the estimating equation. We investigate the performance of the approach using simulated data, and we also use the approach to estimate the effect on pupil absence of school‐level adherence to a randomized water, sanitation, and hygiene intervention in western Kenya. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
Lee AH  Xiang L  Fung WK 《Statistics in medicine》2004,23(17):2757-2769
In many biomedical applications, count data have a large proportion of zeros and the zero-inflated Poisson regression (ZIP) model may be appropriate. A popular score test for zero-inflation, comparing the ZIP model to a standard Poisson regression model, was given by van den Broek. Similarly, for count data that exhibit extra zeros and are simultaneously overdispersed, a score test for testing the ZIP model against a zero-inflated negative binomial alternative was proposed by Ridout, Hinde and Demétrio. However, these test statistics are sensitive to anomalous cases in the data, and incorrect inferences concerning the choice of model may be drawn. In this paper, diagnostic measures are derived to assess the influence of observations on the score statistics. Two examples that motivated the application of zero-inflated regression models are considered to illustrate the importance of sensitivity analysis of the zero-inflation tests.  相似文献   

19.
Commonly in biomedical research, studies collect data in which an outcome measure contains informative excess zeros; for example, when observing the burden of neuritic plaques (NPs) in brain pathology studies, those who show none contribute to our understanding of neurodegenerative disease. The outcome may be characterized by a mixture distribution with one component being the “structural zero” and the other component being a Poisson distribution. We propose a novel variance components score test of genetic association between a set of genetic markers and a zero-inflated count outcome from a mixture distribution. This test shares advantageous properties with single-nucleotide polymorphism (SNP)-set tests which have been previously devised for standard continuous or binary outcomes, such as the sequence kernel association test. In particular, our method has superior statistical power compared to competing methods, especially when there is correlation within the group of markers, and when the SNPs are associated with both the mixing proportion and the rate of the Poisson distribution. We apply the method to Alzheimer’s data from the Rush University Religious Orders Study and Memory and Aging Project, where as proof of principle we find highly significant associations with the APOE gene, in both the “structural zero” and “count” parameters, when applied to a zero-inflated NPs count outcome.  相似文献   

20.
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data‐adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V‐fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS‐MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号