首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Zero‐inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero‐inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero‐inflated and K‐inflated models exhibit a better fit than the zero‐inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
Zero‐inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero‐inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non‐risk group in the population, the ZIP (ZINB) models a two‐component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at‐risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution‐free alternative and compare its performance with these popular parametric models as well as a moment‐based approach proposed by Yu et al. [Statistics in Medicine 2013; 32 : 2390–2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero‐inflated responses. We illustrate our approach with both simulated and real study data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
In a binary model relating a response variable Y to a risk factor X, account may need to take of an extraneous effect Z that is related to X, but not Y. This is known as the association pattern Y?X?Z. The extraneous variable Z is commonly included in models as a covariate. This paper concerns binary models, and investigates the use of deviation from the group mean (D‐GM) and deviation from the fitted fractional polynomial value (D‐FP) for removing the extraneous effect of Z. In a simulation study, D‐FP performed excellently, while the performance of D‐GM was slightly worse than the traditional method of treating Z as a covariate. In addition, estimators with excessive mean square errors or standard errors cannot occur when D‐GM or D‐FP is employed, even in small or sparse data sets. The Y?X?Z association pattern studied here often occurs in fetal studies, where the fetal measurement (X) varies with the gestation age (Z), but gestation age does not relate to the outcome variable (Y; e.g. Down's syndrome). D‐GM and D‐FP perform well with illustrative data from fetal studies, although there is a weak association between X and Z with a lower proportion of case subjects (e.g. 11:1, control to case). It is not necessary to add a new covariate when a model deals with the extraneous effect. The D‐FP or D‐GM methods perform well with the real data studied here, and moreover, D‐FP demonstrated excellent performance in simulations. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
Different amounts of sodium stearoyl lactylate (SSL) (X1) and azodicarbonamide (ADA) (X2) were analyzed to measure their effect on breadmaking using wheat flour with incorporation of pea flour (Pisum sativum) to the dough. The objective of the present work was to optimize the physical properties of dough (Y1Y2Y3Y4), the dough consistency during mixing (Y5Y6) and the baking performance (Y7Y8Y9). A central composite design and second-order models for Yi were employed. For dough physical properties and dough consistency during mixing, the best response was found when SSL varied between 0.5 and 1.5% and ADA between 110 and 170 ppm. For responses concerning the baking performance, better values for specific volume, crumb texture scores and bread score were obtained using SSL between 0.9 and 1.4% and ADA between 50 and 80 ppm. It is concluded that for baking with wheat flour replaced at about 10% with inactivated pea flour it should be advised to use SSL at levels close to 1% with ADA between 50 and 80 ppm.  相似文献   

5.
Health services data often contain a high proportion of zeros. In studies examining patient hospitalization rates, for instance, many patients will have no hospitalizations, resulting in a count of zero. When the number of zeros is greater or less than expected under a standard count model, the data are said to be zero modified relative to the standard model. A similar phenomenon arises with semicontinuous data, which are characterized by a spike at zero followed by a continuous distribution with positive support. When analyzing zero‐modified count and semicontinuous data, flexible mixture distributions are often needed to accommodate both the excess zeros and the typically skewed distribution of nonzero values. Various models have been introduced over the past three decades to accommodate such data, including hurdle models, zero‐inflated models, and two‐part semicontinuous models. This tutorial describes recent modeling strategies for zero‐modified count and semicontinuous data and highlights their role in health services research studies. Part 1 of the tutorial, presented here, provides a general overview of the topic. Part 2, appearing as a companion piece in this issue of Statistics in Medicine, discusses three case studies illustrating applications of the methods to health services research. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

6.
In health research, count outcomes are fairly common and often these counts have a large number of zeros. In order to adjust for these extra zero counts, various modifications of the Poisson regression model have been proposed. Lambert (Lambert, D., Technometrics 34, 1–14, 1992) described a zero-inflated Poisson (ZIP) model that is based on a mixture of a binary distribution ( i ) degenerated at zero with a Poisson distribution ( i ). Depending on the relationship between i and i , she described two variants: a ZIP and a ZIP () model. In this paper, we extend these models for the case of clustered data (e.g., patients observed within hospitals) and describe random-effects ZIP and ZIP () models. These models are appropriate for the analysis of clustered extra-zero Poisson count data. The distribution of the random effects is assumed to be normal and a maximum marginal likelihood estimation method is used to estimate the model parameters. We applied these models to data from patients who underwent colon operations from 123 Veterans Affairs Medical Centers in the National VA Surgical Quality Improvement Program.  相似文献   

7.
Calibration, that is, whether observed outcomes agree with predicted risks, is important when evaluating risk prediction models. For dichotomous outcomes, several tools exist to assess different aspects of model calibration, such as calibration‐in‐the‐large, logistic recalibration, and (non‐)parametric calibration plots. We aim to extend these tools to prediction models for polytomous outcomes. We focus on models developed using multinomial logistic regression (MLR): outcome Y with k categories is predicted using k ? 1 equations comparing each category i (i = 2, … ,k) with reference category 1 using a set of predictors, resulting in k ? 1 linear predictors. We propose a multinomial logistic recalibration framework that involves an MLR fit where Y is predicted using the k ? 1 linear predictors from the prediction model. A non‐parametric alternative may use vector splines for the effects of the linear predictors. The parametric and non‐parametric frameworks can be used to generate multinomial calibration plots. Further, the parametric framework can be used for the estimation and statistical testing of calibration intercepts and slopes. Two illustrative case studies are presented, one on the diagnosis of malignancy of ovarian tumors and one on residual mass diagnosis in testicular cancer patients treated with cisplatin‐based chemotherapy. The risk prediction models were developed on data from 2037 and 544 patients and externally validated on 1107 and 550 patients, respectively. We conclude that calibration tools can be extended to polytomous outcomes. The polytomous calibration plots are particularly informative through the visual summary of the calibration performance. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0 : P(D = 1 | X,Y ) = P(D = 1 | X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Cornstarch/sorghum flour (X1) ratio, water added (X2) and amount of hydroxypropyl methylcellulose (HPMC) used (X3) were varied for making gluten-free bread so as to optimize batter softness (Y1), specific volume (Y2) and crumb grain (Y3). A second-order model was employed to generate a response surface. It was found that the softness of the batter depends significantly on three factors in a linear way. The specific volume (Y2), in particular, was increased significantly with the increment of X1 and X3. The crumb grain (Y3) depended significantly on three factors, its scores increased with X1 and decreased with the water added (X2). Finally, 0.55 cornstarch/sorghum flour ratio, 90% of water added and 3% of HPMC were chosen as the best conditions, considering acceptable levels of specific volume and of crumb grain, and also taking into account the possibility of using the highest proportion of sorghum flour.  相似文献   

10.
This article explores Bayesian joint models for a quantile of longitudinal response, mismeasured covariate and event time outcome with an attempt to (i) characterize the entire conditional distribution of the response variable based on quantile regression that may be more robust to outliers and misspecification of error distribution; (ii) tailor accuracy from measurement error, evaluate non‐ignorable missing observations, and adjust departures from normality in covariate; and (iii) overcome shortages of confidence in specifying a time‐to‐event model. When statistical inference is carried out for a longitudinal data set with non‐central location, non‐linearity, non‐normality, measurement error, and missing values as well as event time with being interval censored, it is important to account for the simultaneous treatment of these data features in order to obtain more reliable and robust inferential results. Toward this end, we develop Bayesian joint modeling approach to simultaneously estimating all parameters in the three models: quantile regression‐based nonlinear mixed‐effects model for response using asymmetric Laplace distribution, linear mixed‐effects model with skew‐t distribution for mismeasured covariate in the presence of informative missingness and accelerated failure time model with unspecified nonparametric distribution for event time. We apply the proposed modeling approach to analyzing an AIDS clinical data set and conduct simulation studies to assess the performance of the proposed joint models and method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

11.
To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two‐phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semiparametric efficient estimator is applied. This approach is made efficient by specifying the phase two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. We perform simulations to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. We provide proofs and R code. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean ‘importance‐weighted’ breadth (Y) of the T‐cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y and measures Y in the optimal subset. We show that the optimal design‐estimation approach can confer anywhere between absent and large efficiency gain (up to 24 % in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost‐standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y | W] is important for realizing the efficiency gain, which is aided by an ample phase two sample and by using a robust fitting method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

12.
In this paper, we explore inference in multi‐response, nonlinear models. By multi‐response, we mean models with m > 1 response variables and accordingly m relations. Each parameter/explanatory variable may appear in one or more of the relations. We study a system estimation approach for simultaneous computation and inference of the model and (co)variance parameters. For illustration, we fit a bivariate Emax model to diabetes dose‐response data. Further, the bivariate Emax model is used in a simulation study that compares the system estimation approach to equation‐by‐equation estimation. We conclude that overall, the system estimation approach performs better for the bivariate Emax model when there are dependencies among relations. The stronger the dependencies, the more we gain in precision by using system estimation rather than equation‐by‐equation estimation. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

13.
The problem of competing risks analysis arises often in public health, demography, actuarial science, industrial reliability applications, and experiments in medical therapeutics. In the classical competing risks scenario one models the risks with a vector T = (T1,…,Tk) of non-negative random variables that represents the potential times to death of k risks. One cannot see T directly but sees instead Y = min(T1,…,Tk) and the actual cause of death. The major difficulty with this analysis is the requirement for the expert to specify the single cause of death that, in fact, may not be the actual cause. This paper addresses competing risks analysis for the situations where one observes Y and the set of several possible causes of death specified by the expert. Many times there are several causes that act together and realistically it is impossible for the expert to assign a death to a single cause. In particular, I provide a likelihood for parametric competing risks analysis when the actual cause of death is possibly misclassified. The data include time to death, Y, and a set of possible causes of death. If misclassification probabilities are unknown, I propose a Baysian analysis based on a prior distribution for the parameters of interest and for the misclassification probabilities.  相似文献   

14.
This article proposes a Bayesian mixed effects zero inflated discrete Weibull (ZIDW) regression model for zero inflated and highly skewed longitudinal count data, as an alternative to mixed effects regression models that are based on the negative binomial, zero inflated negative binomial, and conventional discrete Weibull (DW) distributions. The mixed effects ZIDW regression model is an extension of a recently introduced model based on the DW distribution and uses the log-link function to specify the relationship between the linear predictors and the median counts. The ZIDW approach offers a more robust characteristic of central tendency, compared to the mean count, when there is skewness in the data. A matrix generalized half-t (MGH-t) prior distribution is specified for the random effects covariance matrix as an alternative to the widely used Wishart prior distribution. The methodology is applied to a longitudinal dataset from an epilepsy clinical trial. In a data contamination simulation study, we show that the mixed effect ZIDW regression model is more robust than the competing mixed effects regression models when the data contain excess zeros or outliers. The performance of the ZIDW regression model is also assessed in a simulation study under the specification of, respectively, the MGH-t and Wishart prior distributions for the random effects covariance matrix. It turns out that the highest posterior density intervals under the MGH-t prior for the fixed effects maintain nominal coverage when the true variability between random slopes over time is small, whereas those under the Wishart prior are generally conservative.  相似文献   

15.
While causal mediation analysis has seen considerable recent development for a single measured mediator (M) and final outcome (Y), less attention has been given to repeatedly measured M and Y. Previous methods have typically involved discrete-time models that limit inference to the particular measurement times used and do not recognize the continuous nature of the mediation process over time. To overcome such limitations, we present a new continuous-time approach to causal mediation analysis that uses a differential equations model in a potential outcomes framework to describe the causal relationships among model variables over time. A connection between the differential equation models and standard repeated measures models is made to provide convenient model formulation and fitting. A continuous-time extension of the sequential ignorability assumption allows for identifiable natural direct and indirect effects as functions of time, with estimation based on a two-step approach to model fitting in conjunction with a continuous-time mediation formula. Novel features include a measure of an overall mediation effect based on the “area between the curves,” and an approach for predicting the effects of new interventions. Simulation studies show good properties of estimators and the new methodology is applied to data from a cohort study to investigate sugary drink consumption as a mediator of the effect of socioeconomic status on dental caries in children.  相似文献   

16.
For a continuous outcome in a two‐arm trial that satisfies normal distribution assumptions, we can transform the standardized mean difference with the use of the cumulative distribution function to be the effect size measure P(X < Y ). This measure is already established within engineering as the reliability parameter in stress–strength models, where Y represents the strength of a component and X represents the stress the component undergoes. If X is greater than Y, then the component will fail. In this paper, we consider the closely related effect size measure, This measure is also known as Somer's d, which was introduced by Somers in 1962 as an ordinal measure of association. In this paper, we explore this measure as a treatment effect size for a continuous outcome. Although the point estimates for λ are easily calculated, the interval is not so readily obtained. We compare kernel density estimation and use of bootstrap and jackknife methods to estimate confidence intervals against two further methods for estimating P(X < Y ) and their respective intervals, one of which makes no assumption about the underlying distribution and the other assumes a normal distribution. Simulations show that the choice of the best estimator depends on the value of λ, the variability within the data, and the underlying distribution of the data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
Data can arise as a length‐biased sample rather than as a random sample; e.g. a sample of patients in hospitals or of network cable lines (experimental units with longer stays or longer lines have greater likelihoods of being sampled). The distribution arising from a single length‐biased sampling (LBS) time has been derived (e.g. (The Statistical Analysis of Discrete Time Events. Oxford Press: London, 1972)) and applies when the observed outcome relates to the random variable subjected to LBS. Zelen (Breast Cancer: Trends in Research and Treatment. Raven Press: New York, 1976; 287–301) noted that cases of disease detected from a screening program likewise form a length‐biased sample among all cases, since longer sojourn times afford greater likelihoods of being screen detected. In contrast to the samples on hospital stays and cable lines, however, the length‐biased sojourns (preclinical durations) cannot be observed, although their subsequent clinical durations (survival times) are. This article quantifies the effect of LBS of the sojourn times (or pre‐clinical durations) on the distribution of the observed clinical durations when cases undergo periodic screening for the early detection of disease. We show that, when preclinical and clinical durations are positively correlated, the mean, median, and quartiles of the distribution of the clinical duration from screen‐detected cases can be substantially inflated—even in the absence of any benefit on survival from the screening procedure. Screening studies that report mean survival time need to take account of the fact that, even in the absence of any real benefit, the mean survival among cases in the screen‐detected group will be longer than that among interval cases or among cases that arise in the control arm, above and beyond lead time bias, simply by virtue of the LBS phenomenon. Published in 2009 by John Wiley & Sons, Ltd.  相似文献   

18.
Zero‐inflated count outcomes arise quite often in research and practice. Parametric models such as the zero‐inflated Poisson and zero‐inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution‐free, or semi‐parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero‐inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)‐based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent‐centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD‐based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
Continuous‐time multistate survival models can be used to describe health‐related processes over time. In the presence of interval‐censored times for transitions between the living states, the likelihood is constructed using transition probabilities. Models can be specified using parametric or semiparametric shapes for the hazards. Semiparametric hazards can be fitted using P‐splines and penalised maximum likelihood estimation. This paper presents a method to estimate flexible multistate models that allow for parametric and semiparametric hazard specifications. The estimation is based on a scoring algorithm. The method is illustrated with data from the English Longitudinal Study of Ageing.  相似文献   

20.
When modelling “social bads,” such as illegal drug consumption, researchers are often faced with a dependent variable characterised by a large number of zero observations. Building on the recent literature on hurdle and double‐hurdle models, we propose a double‐inflated modelling framework, where the zero observations are allowed to come from the following: nonparticipants; participant misreporters (who have larger loss functions associated with a truthful response); and infrequent consumers. Due to our empirical application, the model is derived for the case of an ordered discrete‐dependent variable. However, it is similarly possible to augment other such zero‐inflated models (e.g., zero‐inflated count models, and double‐hurdle models for continuous variables). The model is then applied to a consumer choice problem of cannabis consumption. We estimate that 17% of the reported zeros in the cannabis survey are from individuals who misreport their participation, 11% from infrequent users, and only 72% from true nonparticipants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号