首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Count data are collected repeatedly over time in many applications, such as biology, epidemiology, and public health. Such data are often characterized by the following three features. First, correlation due to the repeated measures is usually accounted for using subject‐specific random effects, which are assumed to be normally distributed. Second, the sample variance may exceed the mean, and hence, the theoretical mean–variance relationship is violated, leading to overdispersion. This is usually allowed for based on a hierarchical approach, combining a Poisson model with gamma distributed random effects. Third, an excess of zeros beyond what standard count distributions can predict is often handled by either the hurdle or the zero‐inflated model. A zero‐inflated model assumes two processes as sources of zeros and combines a count distribution with a discrete point mass as a mixture, while the hurdle model separately handles zero observations and positive counts, where then a truncated‐at‐zero count distribution is used for the non‐zero state. In practice, however, all these three features can appear simultaneously. Hence, a modeling framework that incorporates all three is necessary, and this presents challenges for the data analysis. Such models, when conditionally specified, will naturally have a subject‐specific interpretation. However, adopting their purposefully modified marginalized versions leads to a direct marginal or population‐averaged interpretation for parameter estimates of covariate effects, which is the primary interest in many applications. In this paper, we present a marginalized hurdle model and a marginalized zero‐inflated model for correlated and overdispersed count data with excess zero observations and then illustrate these further with two case studies. The first dataset focuses on the Anopheles mosquito density around a hydroelectric dam, while adolescents’ involvement in work, to earn money and support their families or themselves, is studied in the second example. Sub‐models, which result from omitting zero‐inflation and/or overdispersion features, are also considered for comparison's purpose. Analysis of the two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

2.
Zero excess in the study of geographically referenced mortality data sets has been the focus of considerable attention in the literature, with zero‐inflation being the most common procedure to handle this lack of fit. Although hurdle models have also been used in disease mapping studies, their use is more rare. We show in this paper that models using particular treatments of zero excesses are often required for achieving appropriate fits in regular mortality studies since, otherwise, geographical units with low expected counts are oversmoothed. However, as also shown, an indiscriminate treatment of zero excess may be unnecessary and has a problematic implementation. In this regard, we find that naive zero‐inflation and hurdle models, without an explicit modeling of the probabilities of zeroes, do not fix zero excesses problems well enough and are clearly unsatisfactory. Results sharply suggest the need for an explicit modeling of the probabilities that should vary across areal units. Unfortunately, these more flexible modeling strategies can easily lead to improper posterior distributions as we prove in several theoretical results. Those procedures have been repeatedly used in the disease mapping literature, and one should bear these issues in mind in order to propose valid models. We finally propose several valid modeling alternatives according to the results mentioned that are suitable for fitting zero excesses. We show that those proposals fix zero excesses problems and correct the mentioned oversmoothing of risks in low populated units depicting geographic patterns more suited to the data.  相似文献   

3.
This study fills in the current knowledge gaps in statistical analysis of longitudinal zero‐inflated count data by providing a comprehensive review and comparison of the hurdle and zero‐inflated Poisson models in terms of the conceptual framework, computational advantage, and performance under different real data situations. The design of simulations represents the special features of a well‐known longitudinal study of alcoholism so that the results can be generalizable to the substance abuse field. When the hurdle model is more natural under the conceptual framework of the data, the zero‐inflated Poisson model tends to produce inaccurate estimates. Model performance improves with larger sample sizes, lower proportions of missing data, and lower correlations between covariates. The simulation also shows that the computational strength of the hurdle model disappears when random effects are included. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
We propose functional linear models for zero‐inflated count data with a focus on the functional hurdle and functional zero‐inflated Poisson (ZIP) models. Although the hurdle model assumes the counts come from a mixture of a degenerate distribution at zero and a zero‐truncated Poisson distribution, the ZIP model considers a mixture of a degenerate distribution at zero and a standard Poisson distribution. We extend the generalized functional linear model framework with a functional predictor and multiple cross‐sectional predictors to model counts generated by a mixture distribution. We propose an estimation procedure for functional hurdle and ZIP models, called penalized reconstruction, geared towards error‐prone and sparsely observed longitudinal functional predictors. The approach relies on dimension reduction and pooling of information across subjects involving basis expansions and penalized maximum likelihood techniques. The developed functional hurdle model is applied to modeling hospitalizations within the first 2 years from initiation of dialysis, with a high percentage of zeros, in the Comprehensive Dialysis Study participants. Hospitalization counts are modeled as a function of sparse longitudinal measurements of serum albumin concentrations, patient demographics, and comorbidities. Simulation studies are used to study finite sample properties of the proposed method and include comparisons with an adaptation of standard principal components regression. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

5.
The zero‐inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non‐susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well‐suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero‐inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the finite sample performance of MZINB is compared with marginalized zero‐inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied in the evaluation of a school‐based fluoride mouthrinse program on dental caries in 677 children. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

6.
In this paper, we develop estimation procedure for the parameters of a zero‐inflated over‐dispersed/under‐dispersed count model in the presence of missing responses. In particular, we deal with a zero‐inflated extended negative binomial model in the presence of missing responses. A weighted expectation maximization algorithm is used for the maximum likelihood estimation of the parameters involved. Some simulations are conducted to study the properties of the estimators. Robustness of the procedure is shown when count data follow other over‐dispersed models, such as the log‐normal mixture of the Poisson distribution or even from a zero‐inflated Poisson model. An illustrative example and a discussion leading to some conclusions are given. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.
In practice, count data may exhibit varying dispersion patterns and excessive zero values; additionally, they may appear in groups or clusters sharing a common source of variation. We present a novel Bayesian approach for analyzing such data. To model these features, we combine the Conway‐Maxwell‐Poisson distribution, which allows both overdispersion and underdispersion, with a hurdle component for the zeros and random effects for clustering. We propose an efficient Markov chain Monte Carlo sampling scheme to obtain posterior inference from our model. Through simulation studies, we compare our hurdle Conway‐Maxwell‐Poisson model with a hurdle Poisson model to demonstrate the effectiveness of our Conway‐Maxwell‐Poisson approach. Furthermore, we apply our model to analyze an illustrative dataset containing information on the number and types of carious lesions on each tooth in a population of 9‐year‐olds from the Iowa Fluoride Study, which is an ongoing longitudinal study on a cohort of Iowa children that began in 1991.  相似文献   

8.
Count data often arise in biomedical studies, while there could be a special feature with excessive zeros in the observed counts. The zero‐inflated Poisson model provides a natural approach to accounting for the excessive zero counts. In the semiparametric framework, we propose a generalized partially linear single‐index model for the mean of the Poisson component, the probability of zero, or both. We develop the estimation and inference procedure via a profile maximum likelihood method. Under some mild conditions, we establish the asymptotic properties of the profile likelihood estimators. The finite sample performance of the proposed method is demonstrated by simulation studies, and the new model is illustrated with a medical care dataset. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
Zero‐inflated count outcomes arise quite often in research and practice. Parametric models such as the zero‐inflated Poisson and zero‐inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution‐free, or semi‐parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero‐inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)‐based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent‐centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD‐based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
11.
Zero‐inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero‐inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non‐risk group in the population, the ZIP (ZINB) models a two‐component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at‐risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution‐free alternative and compare its performance with these popular parametric models as well as a moment‐based approach proposed by Yu et al. [Statistics in Medicine 2013; 32 : 2390–2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero‐inflated responses. We illustrate our approach with both simulated and real study data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
One of the most controversial dimensions along which developing therapeutic approaches for bulimia can be differentiated is their allegiance to an “abstinence” or “nonabstinence” model. Through analogy to traditional treatment programs for chemical dependency, many self-help and professional programs for bulimia hold that the complete elimination of binge-vomiting behavior is a prerequisite for therapeutic work, and require abstinence from the inception of treatment. In contrast, the nonabstinence model suggests that a more gradual reduction in the frequency of episodes may be preferable in that it provides more opportunities for relapse prevention training and avoids reinforcing dichotomous thinking styles. The present paper reviews the theoretical and clinical arguments that have been advanced by each side, including the case for classifying bulimia as a substance abuse disorder. A strategy for investigating the relative efficacy of the two approaches is proposed. It is suggested that particular attention be paid to such variables as differential attrition, the effect of each modality on the accuracy of self-report, the need for continuing or supplementary therapy, the occurrence of treatment “casualties,” interactions between client characteristics and mode of therapy, and long-term results. In the interim before such data are available, a reasonable clinical recommendation may be the implementation of a “compromise” approach designed to maximize the advantages claimed by each model while minimizing possible risks.  相似文献   

13.
14.
15.
We take a functional data approach to longitudinal studies with complex bivariate outcomes. This work is motivated by data from a physical activity study that measured 2 responses over time in 5‐minute intervals. One response is the proportion of time active in each interval, a continuous proportions with excess zeros and ones. The other response, energy expenditure rate in the interval, is a continuous variable with excess zeros and skewness. This outcome is complex because there are 3 possible activity patterns in each interval (inactive, partially active, and completely active), and those patterns, which are observed, induce both nonrandom and random associations between the responses. More specifically, the inactive pattern requires a zero value in both the proportion for active behavior and the energy expenditure rate; a partially active pattern means that the proportion of activity is strictly between zero and one and that the energy expenditure rate is greater than zero and likely to be moderate, and the completely active pattern means that the proportion of activity is exactly one, and the energy expenditure rate is greater than zero and likely to be higher. To address these challenges, we propose a 3‐part functional data joint modeling approach. The first part is a continuation‐ratio model to reorder the ordinal valued 3 activity patterns. The second part models the proportions when they are in interval (0,1). The last component specifies the skewed continuous energy expenditure rate with Box‐Cox transformations when they are greater than zero. In this 3‐part model, the regression structures are specified as smooth curves measured at various time points with random effects that have a correlation structure. The smoothed random curves for each variable are summarized using a few important principal components, and the association of the 3 longitudinal components is modeled through the association of the principal component scores. The difficulties in handling the ordinal and proportional variables are addressed using a quasi‐likelihood type approximation. We develop an efficient algorithm to fit the model that also involves the selection of the number of principal components. The method is applied to physical activity data and is evaluated empirically by a simulation study.  相似文献   

16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号