首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
In epidemiologic studies of the association between exposure and disease, misclassification of exposure is common and known to induce bias in the effect estimates. The nature of the bias is difficult to foretell. For this purpose, we present a simple method to assess the bias in Poisson regression coefficients for a categorical exposure variable subject to misclassification. We derive expressions for the category specific coefficients from the regression on the error-prone exposure (naive coefficients) in terms of the coefficients from the regression on the true exposure (true coefficients). These expressions are similar for crude and adjusted models, if we assume that the covariates are measured without error and that it is independence between the misclassification probabilities and covariate values. We find that the bias in the naive coefficient for one category of the exposure variable depends on all true category specific coefficients weighted by misclassification probabilities. On the other hand, misclassification of an exposure variable does not induce bias in the estimates of the coefficients of the (perfectly measured) covariates. Similarities with linear regression models are pointed out. For selected scenarios of true exposure-disease associations and selected patterns of misclassification, we illustrate the inconsistency in naive Poisson regression coefficients and show that it can be difficult to intuitively characterize the nature of the bias. Both the magnitude and the direction of the bias may vary between categories of an exposure variable.  相似文献   

2.
Misclassification in a binary exposure variable within an unmatched prospective study may lead to a biased estimate of the disease-exposure relationship. It usually gives falsely small credible intervals because uncertainty in the recorded exposure is not taken into account. When there are several other perfectly measured covariates, interrelationships may introduce further potential for bias. Bayesian methods are proposed for analysing binary outcome studies in which an exposure variable is sometimes misclassified, but its correct values have been validated for a random subsample of the subjects. This Bayesian approach can model relationships between explanatory variables and between exploratory variables and the probabilities of misclassification. Three logistic regressions are used to relate disease to true exposure, misclassified exposure to true exposure and true exposure to other covariates. Credible intervals may be used to make decisions about whether certain parameters are unnecessary and hence whether the model can be reduced in complexity.In the disease-exposure model, for parameters representing coefficients related to perfectly measured covariates, the precision of posterior estimates is only slightly lower than would be found from data with no misclassification. For the risk factor which has misclassification, the estimates of model coefficients obtained are much less biased than those with misclassification ignored.  相似文献   

3.
Motivated by a longitudinal oral health study, we evaluate the performance of binary Markov models in which the response variable is subject to an unconstrained misclassification process and follows a monotone or progressive behavior. Theoretical and empirical arguments show that the simple version of the model can be used to estimate the prevalence, incidences, and misclassification parameters without the need of external information and that the incidence estimators associated with the model outperformed approaches previously proposed in the literature. We propose an extension of the simple version of the binary Markov model to describe the relationship between the covariates and the prevalence and incidence allowing for different classifiers. We implemented a Bayesian version of the extended model and show that, under the settings of our motivating example, the parameters can be estimated without any external information. Finally, the analyses of the motivating problem are presented.  相似文献   

4.
In genetic association studies, mixed effects models have been widely used in detecting the pleiotropy effects which occur when one gene affects multiple phenotype traits. In particular, bivariate mixed effects models are useful for describing the association of a gene with a continuous trait and a binary trait. However, such models are inadequate to feature the data with response mismeasurement, a characteristic that is often overlooked. It has been well studied that in univariate settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose an induced likelihood approach and an EM algorithm method to handle measurement error in continuous response and misclassification in binary response simultaneously. Simulation studies confirm that the proposed methods successfully remove the bias induced from the response mismeasurement.  相似文献   

5.
Response fatigue can cause measurement error and misclassification problems in survey research. Questions asked later in a long survey are often prone to more measurement error or misclassification. The response given is a function of both the true response and participant response fatigue. We investigate the identifiability of survey order effects and their impact on estimators of treatment effects. The focus is on fatigue that affects a given answer to a question rather than fatigue that causes non-response and missing data. We consider linear, Gamma, and logistic models of response that incorporate both the true underlying response and the effect of question order. For continuous data, survey order effects have no impact on study power under a Gamma model. However, under a linear model that allows for convergence of responses to a common mean, the impact of fatigue on power will depend on how fatigue affects both the rate of mean convergence and the variance of responses. For binary data and for less than a 50% chance of a positive response, order effects cause study power to increase under a linear probability (risk difference) model but decrease under a logistic model. The results suggest that measures designed to reduce survey order effects might have unintended consequences. We present a data example that demonstrates the problem of survey order effects.  相似文献   

6.
Multistate Markov regression models used for quantifying the effect size of state‐specific covariates pertaining to the dynamics of multistate outcomes have gained popularity. However, the measurements of multistate outcome are prone to the errors of classification, particularly when a population‐based survey/research is involved with proxy measurements of outcome due to cost consideration. Such a misclassification may affect the effect size of relevant covariates such as odds ratio used in the field of epidemiology. We proposed a Bayesian measurement‐error‐driven hidden Markov regression model for calibrating these biased estimates with and without a 2‐stage validation design. A simulation algorithm was developed to assess various scenarios of underestimation and overestimation given nondifferential misclassification (independent of covariates) and differential misclassification (dependent on covariates). We applied our proposed method to the community‐based survey of androgenetic alopecia and found that the effect size of the majority of covariate was inflated after calibration regardless of which type of misclassification. Our proposed Bayesian measurement‐error‐driven hidden Markov regression model is practicable and effective in calibrating the effects of covariates on multistate outcome, but the prior distribution on measurement errors accrued from 2‐stage validation design is strongly recommended.  相似文献   

7.
It is well known that estimates of association between an outcome variable and a set of categorical covariates, some of which are measured with misclassification, tend to be biased upon application of the usual methods of estimation that ignore the classification error. We propose a method to adjust for misclassification in covariates when one applies the generalized linear model. In the case where one can observe some true covariates only through surrogates, we combine a latent class analysis with the approach to incorporate multiple surrogates into the model. We include discussion on the efficacy of repeated measurements which one can view as a special case of multiple surrogates with identical distribution. We provide two examples to demonstrate the applicability of the method and the efficacy of multiple replicates for a covariate subject to misclassification in a regression framework.  相似文献   

8.
Response misclassification of counted data biases and understates the uncertainty of parameter estimators in Poisson regression models. To correct these problems, researchers have devised classical procedures that rely on asymptotic distribution results and supplemental validation data in order to estimate unknown misclassification parameters. We derive a new Bayesian Poisson regression procedure that accounts and corrects for misclassification for a count variable with two categories. Under the Bayesian paradigm, one can use validation data, expert opinion, or a combination of these two approaches to correct for the consequences of misclassification. The Bayesian procedure proposed here yields an operationally effective way to correct and account for misclassification effects in Poisson count regression models. We demonstrate the performance of the model in a simulation study. Additionally, we analyze two real-data examples and compare our new Bayesian inference method that adjusts for misclassification with a similar analysis that ignores misclassification.  相似文献   

9.
We consider Cox proportional hazards regression when the covariate vector includes error-prone discrete covariates along with error-free covariates, which may be discrete or continuous. The misclassification in the discrete error-prone covariates is allowed to be of any specified form. Building on the work of Nakamura and his colleagues, we present a corrected score method for this setting. The method can handle all three major study designs (internal validation design, external validation design, and replicate measures design), both functional and structural error models, and time-dependent covariates satisfying a certain 'localized error' condition. We derive the asymptotic properties of the method and indicate how to adjust the covariance matrix of the regression coefficient estimates to account for estimation of the misclassification matrix. We present the results of a finite-sample simulation study under Weibull survival with a single binary covariate having known misclassification rates. The performance of the method described here was similar to that of related methods we have examined in previous works. Specifically, our new estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We also present simulation results for our method for the case where the misclassification probabilities are estimated from an external replicate measures study. Our method generally performed well in these simulations. The new estimator has a broader range of applicability than many other estimators proposed in the literature, including those described in our own earlier work, in that it can handle time-dependent covariates with an arbitrary misclassification structure. We illustrate the method on data from a study of the relationship between dietary calcium intake and distal colon cancer.  相似文献   

10.
A time‐varying latent variable model is proposed to jointly analyze multivariate mixed‐support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state‐specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation‐maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data.  相似文献   

11.
We describe a methodology for analysing transitions over time in a binary outcome variable that is subject to misclassification (that is, measurement error). Logistic regression models for transition events in the true underlying state are combined with estimates of probabilities of misclassification of the underlying state. The model is based on the Markovian assumption that the probabilities of transition in the underlying state at a given time depend only on the underlying state at the previous time. Hence we estimate odds-ratio effects for transitions that are adjusted for the effect of misclassification. Comparing these adjusted estimates with estimates that are obtained without taking misclassification into account indicates that the latter can be biased either toward or away from the null. For the estimates to exist, certain restrictions on the observed data and misclassification probabilities need to be met. If these restrictions are not satisfied then the conclusion from the analysis is that all observed transition events can be explained solely by the error in outcome assessment, in which case it is likely that an aspect of the model is incorrect. The motivation for this work comes from an analysis of transitions in depression status for a cohort of Australian teenagers participating in a longitudinal study of adolescent health.  相似文献   

12.
In studies of older adults, researchers often recruit proxy respondents, such as relatives or caregivers, when study participants cannot provide self‐reports (e.g., because of illness). Proxies are usually only sought to report on behalf of participants with missing self‐reports; thus, either a participant self‐report or proxy report, but not both, is available for each participant. Furthermore, the missing‐data mechanism for participant self‐reports is not identifiable and may be nonignorable. When exposures are binary and participant self‐reports are conceptualized as the gold standard, substituting error‐prone proxy reports for missing participant self‐reports may produce biased estimates of outcome means. Researchers can handle this data structure by treating the problem as one of misclassification within the stratum of participants with missing self‐reports. Most methods for addressing exposure misclassification require validation data, replicate data, or an assumption of nondifferential misclassification; other methods may result in an exposure misclassification model that is incompatible with the analysis model. We propose a model that makes none of the aforementioned requirements and still preserves model compatibility. Two user‐specified tuning parameters encode the exposure misclassification model. Two proposed approaches estimate outcome means standardized for (potentially) high‐dimensional covariates using multiple imputation followed by propensity score methods. The first method is parametric and uses maximum likelihood to estimate the exposure misclassification model (i.e., the imputation model) and the propensity score model (i.e., the analysis model); the second method is nonparametric and uses boosted classification and regression trees to estimate both models. We apply both methods to a study of elderly hip fracture patients. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives, which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.  相似文献   

14.
Recent studies have shown how cost-effectiveness analysis can be undertaken in a regression framework. This contribution explores the use of practical regression models for estimating cost-effectiveness from a Bayesian perspective. Two different Bayesian models are described. The first considers the outcome measure to be a quantitative variable. In the second model the individual outcome measure is a binary variable with value 1 if any objective has been achieved. We describe the implementation of the model using data from a trial that compares two highly active antiretroviral therapies in HIV asymptomatic patients. Data on direct cost and data effectiveness (percentage of patients with undetectable viral load and quality of life) were recorded. If we consider the quality of life as an effectiveness measure, the new treatment is preferred for a willingness to pay more than Euro 142.3 for an increase in the quality of life. For illustrative purposes, if we compare the results with an analogous model that does not include covariates, the critical value becomes Euro 247.4. For the binary measure of effectiveness the control treatment dominates the new treatment.  相似文献   

15.
预测模型中考虑时依性变量可改善模型的总体表现,提高其临床应用价值。界标模型、联合模型等基于传统回归策略在处理时依性变量个数和适用情境等方面存在局限,神经网络等机器学习算法有望对其灵活处理。本文针对传统模型、机器学习算法,总结各自纳入时依性变量的建模思路,梳理各方法的适用场景,概括现有方法仍存在的问题,以期为未来预测建模处理时依性变量提供方法学启示。  相似文献   

16.
We develop a simulation‐based procedure for determining the required sample size in binomial regression risk assessment studies when response data are subject to misclassification. A Bayesian average power criterion is used to determine a sample size that provides high probability, averaged over the distribution of potential future data sets, of correctly establishing the direction of association between predictor variables and the probability of event occurrence. The method is broadly applicable to any parametric binomial regression model including, but not limited to, the popular logistic, probit, and complementary log–log models. We detail a common medical scenario wherein ascertainment of true disease status is impractical or otherwise impeded, and in its place the outcome of a single binary diagnostic test is used as a surrogate. These methods are then extended to the two diagnostic test setting. We illustrate the method with categorical covariates using one example that involves screening for human papillomavirus. This example coupled with results from simulated data highlights the utility of our Bayesian sample size procedure with error prone measurements. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

17.
传染病链二项分布资料的Poisson回归模型   总被引:1,自引:0,他引:1  
目的 本文旨在介绍Poisson回归模型在具有链结构的传染病资料分析中的应用。方法 借助极大似然法,对五口之家的普通感冒资料分别拟合五种Poisson回归模型,其中“代数”以亚变量的形式进入模型,而“已感染者数”以连续性指示变量进入模型。结果 本文介绍的模型以Greenwood和Reed—Frost链二项分布模型为特例,Poisson回归模型与相应的链二项分布模型分析结果往往非常近似。结论 用Poisson回归模型处理和分析传染病链二项分布资料更简便易行,必要时可同时分析协变量的作用。  相似文献   

18.
This paper concerns the statistical analysis of certain binary data arising in molecular studies of cancer. In allelic-loss experiments, tumour cell genomes are analysed at informative molecular marker loci to identify deleted chromosomal regions. The resulting binary data are used to infer properties of putative suppressor genes, genes involved in normal cell cycling. Various factors can complicate this inference, including background loss of heterozygosity, spatial (that is, within chromosome) dependence of the binary responses, non-informativeness of markers, covariates such as protein levels or tumour histology, heterogeneity of cells within tumours, and measurement error. We focus on the first three factors, discussing methods for statistical inference that separate background loss from significant loss. We outline the extension to other inferences, such as comparison questions and the relationship to covariates. Using characteristic features of tumourigenesis, we present a framework for the stochastic modelling of allelic-loss data, and build models within this framework; in particular, we propose a simple model that has chromosome breaks at locations of a Poisson process, and preferential selection cells with inactivated suppressor genes. We illustrate these methods on allelic-loss data from induced rat mammary tumours and human bladder cancers. © 1998 John Wiley & Sons, Ltd.  相似文献   

19.
The potential for bias due to misclassification error in regression analysis is well understood by statisticians and epidemiologists. Assuming little or no available data for estimating misclassification probabilities, investigators sometimes seek to gauge the sensitivity of an estimated effect to variations in the assumed values of those probabilities. We present an intuitive and flexible approach to such a sensitivity analysis, assuming an underlying logistic regression model. For outcome misclassification, we argue that a likelihood‐based analysis is the cleanest and the most preferable approach. In the case of covariate misclassification, we combine observed data on the outcome, error‐prone binary covariate of interest, and other covariates measured without error, together with investigator‐supplied values for sensitivity and specificity parameters, to produce corresponding positive and negative predictive values. These values serve as estimated weights to be used in fitting the model of interest to an appropriately defined expanded data set using standard statistical software. Jackknifing provides a convenient tool for incorporating uncertainty in the estimated weights into valid standard errors to accompany log odds ratio estimates obtained from the sensitivity analysis. Examples illustrate the flexibility of this unified strategy, and simulations suggest that it performs well relative to a maximum likelihood approach carried out via numerical optimization. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
Eberly LE  Carlin BP 《Statistics in medicine》2000,19(17-18):2279-2294
The marked increase in popularity of Bayesian methods in statistical practice over the last decade owes much to the simultaneous development of Markov chain Monte Carlo (MCMC) methods for the evaluation of requisite posterior distributions. However, along with this increase in computing power has come the temptation to fit models larger than the data can readily support, meaning that often the propriety of the posterior distributions for certain parameters depends on the propriety of the associated prior distributions. An important example arises in spatial modelling, wherein separate random effects for capturing unstructured heterogeneity and spatial clustering are of substantive interest, even though only their sum is well identified by the data. Increasing the informative content of the associated prior distributions offers an obvious remedy, but one that hampers parameter interpretability and may also significantly slow the convergence of the MCMC algorithm. In this paper we investigate the relationship among identifiability, Bayesian learning and MCMC convergence rates for a common class of spatial models, in order to provide guidance for prior selection and algorithm tuning. We are able to elucidate the key issues with relatively simple examples, and also illustrate the varying impacts of covariates, outliers and algorithm starting values on the resulting algorithms and posterior distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号