首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multistate Markov regression models used for quantifying the effect size of state‐specific covariates pertaining to the dynamics of multistate outcomes have gained popularity. However, the measurements of multistate outcome are prone to the errors of classification, particularly when a population‐based survey/research is involved with proxy measurements of outcome due to cost consideration. Such a misclassification may affect the effect size of relevant covariates such as odds ratio used in the field of epidemiology. We proposed a Bayesian measurement‐error‐driven hidden Markov regression model for calibrating these biased estimates with and without a 2‐stage validation design. A simulation algorithm was developed to assess various scenarios of underestimation and overestimation given nondifferential misclassification (independent of covariates) and differential misclassification (dependent on covariates). We applied our proposed method to the community‐based survey of androgenetic alopecia and found that the effect size of the majority of covariate was inflated after calibration regardless of which type of misclassification. Our proposed Bayesian measurement‐error‐driven hidden Markov regression model is practicable and effective in calibrating the effects of covariates on multistate outcome, but the prior distribution on measurement errors accrued from 2‐stage validation design is strongly recommended.  相似文献   

2.
This paper studies the multiscale analysis of neural spike trains, through both graphical and Poisson process approaches. We introduce the interspike interval plot, which simultaneously visualizes characteristics of neural spiking activity at different time scales. Using an inhomogeneous Poisson process framework, we discuss multiscale estimates of the intensity functions of spike trains. We also introduce the windowing effect for two multiscale methods. Using quasi‐likelihood, we develop bootstrap confidence intervals for the multiscale intensity function. We provide a cross‐validation scheme, to choose the tuning parameters, and study its unbiasedness. Studying the relationship between the spike rate and the stimulus signal, we observe that adjusting for the first spike latency is important in cross‐validation. We show, through examples, that the correlation between spike trains and spike count variability can be multiscale phenomena. Furthermore, we address the modeling of the periodicity of the spike trains caused by a stimulus signal or by brain rhythms. Within the multiscale framework, we introduce intensity functions for spike trains with multiplicative and additive periodic components. Analyzing a dataset from the retinogeniculate synapse, we compare the fit of these models with the Bayesian adaptive regression splines method and discuss the limitations of the methodology. Computational efficiency, which is usually a challenge in the analysis of spike trains, is one of the highlights of these new models. In an example, we show that the reconstruction quality of a complex intensity function demonstrates the ability of the multiscale methodology to crack the neural code. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
Poor measurement of explanatory variables occurs frequently in observational studies. Error‐prone observations may lead to biased estimation and loss of power in detecting the impact of explanatory variables on the response. We consider misclassified binary exposure in the context of case–control studies, assuming the availability of validation data to inform the magnitude of the misclassification. A Bayesian adjustment to correct the misclassification is investigated. Simulation studies show that the Bayesian method can have advantages over non‐Bayesian counterparts, particularly in the face of a rare exposure, small validation sample sizes, and uncertainty about whether exposure misclassification is differential or non‐differential. The method is illustrated via application to several real studies. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
We develop a simulation‐based procedure for determining the required sample size in binomial regression risk assessment studies when response data are subject to misclassification. A Bayesian average power criterion is used to determine a sample size that provides high probability, averaged over the distribution of potential future data sets, of correctly establishing the direction of association between predictor variables and the probability of event occurrence. The method is broadly applicable to any parametric binomial regression model including, but not limited to, the popular logistic, probit, and complementary log–log models. We detail a common medical scenario wherein ascertainment of true disease status is impractical or otherwise impeded, and in its place the outcome of a single binary diagnostic test is used as a surrogate. These methods are then extended to the two diagnostic test setting. We illustrate the method with categorical covariates using one example that involves screening for human papillomavirus. This example coupled with results from simulated data highlights the utility of our Bayesian sample size procedure with error prone measurements. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

5.
We consider Cox proportional hazards regression when the covariate vector includes error-prone discrete covariates along with error-free covariates, which may be discrete or continuous. The misclassification in the discrete error-prone covariates is allowed to be of any specified form. Building on the work of Nakamura and his colleagues, we present a corrected score method for this setting. The method can handle all three major study designs (internal validation design, external validation design, and replicate measures design), both functional and structural error models, and time-dependent covariates satisfying a certain 'localized error' condition. We derive the asymptotic properties of the method and indicate how to adjust the covariance matrix of the regression coefficient estimates to account for estimation of the misclassification matrix. We present the results of a finite-sample simulation study under Weibull survival with a single binary covariate having known misclassification rates. The performance of the method described here was similar to that of related methods we have examined in previous works. Specifically, our new estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We also present simulation results for our method for the case where the misclassification probabilities are estimated from an external replicate measures study. Our method generally performed well in these simulations. The new estimator has a broader range of applicability than many other estimators proposed in the literature, including those described in our own earlier work, in that it can handle time-dependent covariates with an arbitrary misclassification structure. We illustrate the method on data from a study of the relationship between dietary calcium intake and distal colon cancer.  相似文献   

6.
Overdispersion and structural zeros are two major manifestations of departure from the Poisson assumption when modeling count responses using Poisson log‐linear regression. As noted in a large body of literature, ignoring such departures could yield bias and lead to wrong conclusions. Different approaches have been developed to tackle these two major problems. In this paper, we review available methods for dealing with overdispersion and structural zeros within a longitudinal data setting and propose a distribution‐free modeling approach to address the limitations of these methods by utilizing a new class of functional response models. We illustrate our approach with both simulated and real study data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
In epidemiologic studies of the association between exposure and disease, misclassification of exposure is common and known to induce bias in the effect estimates. The nature of the bias is difficult to foretell. For this purpose, we present a simple method to assess the bias in Poisson regression coefficients for a categorical exposure variable subject to misclassification. We derive expressions for the category specific coefficients from the regression on the error-prone exposure (naive coefficients) in terms of the coefficients from the regression on the true exposure (true coefficients). These expressions are similar for crude and adjusted models, if we assume that the covariates are measured without error and that it is independence between the misclassification probabilities and covariate values. We find that the bias in the naive coefficient for one category of the exposure variable depends on all true category specific coefficients weighted by misclassification probabilities. On the other hand, misclassification of an exposure variable does not induce bias in the estimates of the coefficients of the (perfectly measured) covariates. Similarities with linear regression models are pointed out. For selected scenarios of true exposure-disease associations and selected patterns of misclassification, we illustrate the inconsistency in naive Poisson regression coefficients and show that it can be difficult to intuitively characterize the nature of the bias. Both the magnitude and the direction of the bias may vary between categories of an exposure variable.  相似文献   

8.
In many practical applications, count data often exhibit greater or less variability than allowed by the equality of mean and variance, referred to as overdispersion/underdispersion, and there are several reasons that may lead to the overdispersion/underdispersion such as zero inflation and mixture. Moreover, if the count data are distributed as a generalized Poisson or a negative binomial distribution that accommodates extra variation not explained by a simple Poisson or a binomial model, then the dispersion occurs too. In this paper, we deal with a class of two‐component zero‐inflated generalized Poisson mixture regression models to fit such data and propose a local influence measure procedure for model comparison and statistical diagnostics. At first, we formally develop a general model framework that unifies zero inflation, mixture as well as overdispersion/underdispersion simultaneously, and then we mainly investigate two types of perturbation schemes, the global and individual perturbation schemes, for perturbing various model assumptions and detecting influential observations. Also, we obtain the corresponding local influence measures. Our method is novel for count data analysis and can be used to explore these essential issues such as zero inflation, mixture, and dispersion related to zero‐inflated generalized Poisson mixture models. On the basis of the results of model comparison, we could further conduct the sensitivity analysis of perturbation as well as hypothesis test with more accuracy. Finally, we employ here a simulation study and a real example to illustrate the proposed local influence measures. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased with non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear generalized estimating equations (GEE) models for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis.  相似文献   

10.
We examine the impact of nondifferential outcome misclassification on odds ratios estimated from pair‐matched case‐control studies and propose a Bayesian model to adjust these estimates for misclassification bias. The model relies on access to a validation subgroup with confirmed outcome status for all case‐control pairs as well as prior knowledge about the positive and negative predictive value of the classification mechanism. We illustrate the model's performance on simulated data and apply it to a database study examining the presence of ten morbidities in the prodromal phase of multiple sclerosis.  相似文献   

11.
Bayesian methods are proposed for analysing matched case-control studies in which a binary exposure variable is sometimes measured with error, but whose correct values have been validated for a random sample of the matched case-control sets. Three models are considered. Model 1 makes few assumptions other than randomness and independence between matched sets, while Models 2 and 3 are logistic models, with Model 3 making additional distributional assumptions about the variation between matched sets. With Models 1 and 2 the data are examined in two stages. The first stage analyses data from the validation sample and is easy to perform; the second stage analyses the main body of data and requires MCMC methods. All relevant information is transferred between the stages by using the posterior distributions from the first stage as the prior distributions for the second stage. With Model 3, a hierarchical structure is used to model the relationship between the exposure probabilities of the matched sets, which gives the potential to extract more information from the data. All the methods that are proposed are generalized to studies in which there is more than one control for each case. The Bayesian methods and a maximum likelihood method are applied to a data set for which the exposure of every patient was measured using both an imperfect measure that is subject to misclassification, and a much better measure whose classifications may be treated as correct. To test methods, the latter information was suppressed for all but a random sample of matched sets.  相似文献   

12.
Logistic regression is the standard method for assessing predictors of diseases. In logistic regression analyses, a stepwise strategy is often adopted to choose a subset of variables. Inference about the predictors is then made based on the chosen model constructed of only those variables retained in that model. This method subsequently ignores both the variables not selected by the procedure, and the uncertainty due to the variable selection procedure. This limitation may be addressed by adopting a Bayesian model averaging approach, which selects a number of all possible such models, and uses the posterior probabilities of these models to perform all inferences and predictions. This study compares the Bayesian model averaging approach with the stepwise procedures for selection of predictor variables in logistic regression using simulated data sets and the Framingham Heart Study data. The results show that in most cases Bayesian model averaging selects the correct model and out-performs stepwise approaches at predicting an event of interest.  相似文献   

13.
Artificial neural networks (ANNs) are being used increasingly for the prediction of clinical outcomes and classification of disease phenotypes. A lack of understanding of the statistical principles underlying ANNs has led to widespread misuse of these tools in the biomedical arena. In this paper, the authors compare the performance of ANNs with that of conventional linear logistic regression models in an epidemiological study of infant wheeze. Data on the putative risk factors for infant wheeze have been obtained from a sample of 7318 infants taking part in the Avon Longitudinal Study of Parents and Children (ALSPAC). The data were analysed using logistic regression models and ANNs, and performance based on misclassification rates of a validation data set were compared. Misclassification rates in the training data set decreased as the complexity of the ANN increased: h = 0: 17.9%; h = 2: 16.2%; h = 5: 14.9%, and h = 10: 9.2%. However, the more complex models did not generalise well to new data sets drawn from the same population: validation data set misclassification rates: h = 0: 17.9%; h = 2: 19.6%; h = 5: 20.2% and h = 10: 22.9%. There is no evidence from this study that ANNs outperform conventional methods of analysing epidemiological data. Increasing the complexity of the models serves only to overfit the model to the data. It is important that a validation or test data set is used to assess the performance of highly complex ANNs to avoid overfitting.  相似文献   

14.
Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validation-study data. These methods are readily applicable under random cross-sectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.  相似文献   

15.
Array comparative genomic hybridization (aCGH) provides a genome‐wide information of DNA copy number that is potentially useful for disease classification. One immediate problem is that the data contain many features (probes) but only a few samples. Existing approaches to overcome this problem include features selection, ridge regression and partial least squares. However, these methods typically ignore the spatial characteristic of aCGH data. To explicitly make use of this spatial information we develop a procedure called smoothed logistic regression (SLR) model. The procedure is based on a mixed logistic regression model, where the random component is a mixture distribution that controls smoothness and sparseness. Conceptually such a procedure is straightforward, but its implementation is complicated due to computational problems. We develop a fast and reliable iterative weighted least‐squares algorithm based on the singular value decomposition. Simulated data and two real data sets are used to illustrate the procedure. For real data sets, error rates are calculated using the leave‐one‐out cross validation procedure. For both simulated and real data examples, SLR achieves better misclassification error rates compared with previous methods. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

16.
17.
In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on validation. We explore how leave-one-out cross validation (LOO) can be used to compare Bayesian models for MRP. We investigate two approximate calculations of LOO: Pareto smoothed importance sampling (PSIS-LOO) and a survey-weighted alternative (WTD-PSIS-LOO). Using two simulation designs, we examine how accurately these two criteria recover the correct ordering of model goodness at predicting population and small-area estimands. Focusing first on variable selection, we find that neither PSIS-LOO nor WTD-PSIS-LOO correctly recovers the models' order for an MRP population estimand, although both criteria correctly identify the best and worst model. When considering small-area estimation, the best model differs for different small areas, highlighting the complexity of MRP validation. When considering different priors, the models' order seems slightly better at smaller-area levels. These findings suggest that, while not terrible, PSIS-LOO-based ranking techniques may not be suitable to evaluate MRP as a method. We suggest this is due to the aggregation stage of MRP, where individual-level prediction errors average out. We validate these results by applying to the real world National Health and Nutrition Examination Survey (NHANES) data in the United States. Altogether, these results show that PSIS-LOO-based model validation tools need to be used with caution and might not convey the full story when validating MRP as a method.  相似文献   

18.
Internal validation data offer a well-recognized means to help correct for exposure misclassification or measurement error. When available, external validation data offer the advantage of cost-effectiveness. However, external data are a generally inefficient source of information about misclassification parameters. Furthermore, external data are not necessarily "transportable", for example, if there are differences in the design or target populations of the main and validation studies. Recent work has suggested weighted estimators to simultaneously take advantage of internal and external validation data. We explore efficiency and transportability in the fundamental case of estimating the odds ratio for binary exposure in a case-control setting. Our results support the use of closed-form weighted log odds ratio estimators in place of computationally demanding maximum likelihood estimators under both types of validation study designs (using internal data only, and combining internal and external data). We also provide and assess a formal test of the transportability assumption, and introduce a new log odds ratio estimator that is inherently robust to violation of that assumption. A case-control study of the association between maternal antibiotic use and sudden infant death syndrome provides a real-data example.  相似文献   

19.
In observational studies, misclassification of exposure is ubiquitous and can substantially bias the estimated association between an outcome and an exposure. Although misclassification in a single observational study has been well studied, few papers have considered it in a meta-analysis. Meta-analyses of observational studies provide important evidence for health policy decisions, especially when large randomized controlled trials are unethical or unavailable. It is imperative to account properly for misclassification in a meta-analysis to obtain valid point and interval estimates. In this paper, we propose a novel Bayesian approach to filling this methodological gap. We simultaneously synthesize two (or more) meta-analyses, with one on the association between a misclassified exposure and an outcome (main studies), and the other on the association between the misclassified exposure and the true exposure (validation studies). We extend the current scope for using external validation data by relaxing the “transportability” assumption by means of random effects models. Our model accounts for heterogeneity between studies and can be extended to allow different studies to have different exposure measurements. The proposed model is evaluated through simulations and illustrated using real data from a meta-analysis of the effect of cigarette smoking on diabetic peripheral neuropathy.  相似文献   

20.
Paradigms for substance abuse cue-reactivity research involve pharmacological or stressful stimulation designed to elicit stress and craving responses in cocaine-dependent subjects. It is unclear as to whether stress induced from participation in such studies increases drug-seeking behavior. We propose a 2-state Hidden Markov model to model the number of cocaine abuses per week before and after participation in a stress-and cue-reactivity study. The hypothesized latent state corresponds to 'high' or 'low' use. To account for a preponderance of zeros, we assume a zero-inflated Poisson model for the count data. Transition probabilities depend on the prior week's state, fixed demographic variables, and time-varying covariates. We adopt a Bayesian approach to model fitting, and use the conditional predictive ordinate statistic to demonstrate that the zero-inflated Poisson hidden Markov model outperforms other models for longitudinal count data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号