共查询到20条相似文献,搜索用时 0 毫秒
1.
When assessing association between a binary trait and some covariates, the binary response may be subject to unidirectional misclassification. Unidirectional misclassification can occur when revealing a particular level of the trait is associated with a type of cost, such as a social desirability or financial cost. The feasibility of addressing misclassification is commonly obscured by model identification issues. The current paper attempts to study the efficacy of inference when the binary response variable is subject to unidirectional misclassification. From a theoretical perspective, we demonstrate that the key model parameters possess identifiability, except for the case with a single binary covariate. From a practical standpoint, the logistic model with quantitative covariates can be weakly identified, in the sense that the Fisher information matrix may be near singular. This can make learning some parameters difficult under certain parameter settings, even with quite large samples. In other cases, the stronger identification enables the model to provide more effective adjustment for unidirectional misclassification. An extension to the Poisson approximation of the binomial model reveals the identifiability of the Poisson and zero‐inflated Poisson models. For fully identified models, the proposed method adjusts for misclassification based on learning from data. For binary models where there is difficulty in identification, the method is useful for sensitivity analyses on the potential impact from unidirectional misclassification. 相似文献
2.
We provide a simple analytic correction for risk factor misclassification in a matched case-control study with variable numbers of controls per case. The method is an extension of existing methodology, and involves estimating the corrected proportions of controls and cases in risk factor categories within each matched set. These estimates are then used to calculate the Mantel-Haenszel odds ratio estimate corrected for misclassification. A simulation-based interval estimate is developed. An example is given from a study of risk factors for progression of benign breast disease to breast cancer, in which the risk factor is a biological marker measured with poor sensitivity. 相似文献
3.
We present an approach that uses latent variable modeling and multiple imputation to correct rater bias when one group of raters tends to be more lenient in assigning a diagnosis than another. Our method assumes that there exists an unobserved moderate category of patient who is assigned a positive diagnosis by one type of rater and a negative diagnosis by the other type. We present a Bayesian random effects censored ordinal probit model that allows us to calibrate the diagnoses across rater types by identifying and multiply imputing 'case' or 'non-case' status for patients in the moderate category. A Markov chain Monte Carlo algorithm is presented to estimate the posterior distribution of the model parameters and generate multiple imputations. Our method enables the calibrated diagnosis variable to be used in subsequent analyses while also preserving uncertainty in true diagnosis. We apply our model to diagnoses of posttraumatic stress disorder (PTSD) from a depression study where nurse practitioners were twice as likely as clinical psychologists to diagnose PTSD despite the fact that participants were randomly assigned to either a nurse or a psychologist. Our model appears to balance PTSD rates across raters, provides a good fit to the data, and preserves between-rater variability. After calibrating the diagnoses of PTSD across rater types, we perform an analysis looking at the effects of comorbid PTSD on changes in depression scores over time. Results are compared with an analysis that uses the original diagnoses and show that calibrating the PTSD diagnoses can yield different inferences. 相似文献
4.
Generalized linear mixed model for binary outcomes when covariates are subject to measurement errors and detection limits 下载免费PDF全文
Longitudinal measurement of biomarkers is important in determining risk factors for binary endpoints such as infection or disease. However, biomarkers are subject to measurement error, and some are also subject to left‐censoring due to a lower limit of detection. Statistical methods to address these issues are few. We herein propose a generalized linear mixed model and estimate the model parameters using the Monte Carlo Newton‐Raphson (MCNR) method. Inferences regarding the parameters are made by applying Louis's method and the delta method. Simulation studies were conducted to compare the proposed MCNR method with existing methods including the maximum likelihood (ML) method and the ad hoc approach of replacing the left‐censored values with half of the detection limit (HDL). The results showed that the performance of the MCNR method is superior to ML and HDL with respect to the empirical standard error, as well as the coverage probability for the 95% confidence interval. The HDL method uses an incorrect imputation method, and the computation is constrained by the number of quadrature points; while the ML method also suffers from the constrain for the number of quadrature points, the MCNR method does not have this limitation and approximates the likelihood function better than the other methods. The improvement of the MCNR method is further illustrated with real‐world data from a longitudinal study of local cervicovaginal HIV viral load and its effects on oncogenic HPV detection in HIV‐positive women. 相似文献
5.
A hidden Markov model approach to analyze longitudinal ternary outcomes when some observed states are possibly misclassified 下载免费PDF全文
Julia S. Benoit Wenyaw Chan Sheng Luo Hung‐Wen Yeh Rachelle Doody 《Statistics in medicine》2016,35(9):1549-1557
Understanding the dynamic disease process is vital in early detection, diagnosis, and measuring progression. Continuous‐time Markov chain (CTMC) methods have been used to estimate state‐change intensities but challenges arise when stages are potentially misclassified. We present an analytical likelihood approach where the hidden state is modeled as a three‐state CTMC model allowing for some observed states to be possibly misclassified. Covariate effects of the hidden process and misclassification probabilities of the hidden state are estimated without information from a ‘gold standard’ as comparison. Parameter estimates are obtained using a modified expectation‐maximization (EM) algorithm, and identifiability of CTMC estimation is addressed. Simulation studies and an application studying Alzheimer's disease caregiver stress‐levels are presented. The method was highly sensitive to detecting true misclassification and did not falsely identify error in the absence of misclassification. In conclusion, we have developed a robust longitudinal method for analyzing categorical outcome data when classification of disease severity stage is uncertain and the purpose is to study the process' transition behavior without a gold standard. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献
6.
Reducing Monte Carlo error in the Bayesian estimation of risk ratios using log‐binomial regression models 下载免费PDF全文
In cohort studies, binary outcomes are very often analyzed by logistic regression. However, it is well known that when the goal is to estimate a risk ratio, the logistic regression is inappropriate if the outcome is common. In these cases, a log‐binomial regression model is preferable. On the other hand, the estimation of the regression coefficients of the log‐binomial model is difficult owing to the constraints that must be imposed on these coefficients. Bayesian methods allow a straightforward approach for log‐binomial regression models and produce smaller mean squared errors in the estimation of risk ratios than the frequentist methods, and the posterior inferences can be obtained using the software WinBUGS. However, Markov chain Monte Carlo methods implemented in WinBUGS can lead to large Monte Carlo errors in the approximations to the posterior inferences because they produce correlated simulations, and the accuracy of the approximations are inversely related to this correlation. To reduce correlation and to improve accuracy, we propose a reparameterization based on a Poisson model and a sampling algorithm coded in R. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
7.
Estimating recurrence and incidence of preterm birth subject to measurement error in gestational age: A hidden Markov modeling approach 下载免费PDF全文
Paul S. Albert 《Statistics in medicine》2018,37(12):1973-1985
Prediction of preterm birth as well as characterizing the etiological factors affecting both the recurrence and incidence of preterm birth (defined as gestational age at birth ≤ 37 wk) are important problems in obstetrics. The National Institute of Child Health and Human Development (NICHD) consecutive pregnancy study recently examined this question by collecting data on a cohort of women with at least 2 pregnancies over a fixed time interval. Unfortunately, measurement error due to the dating of conception may induce sizable error in computing gestational age at birth. This article proposes a flexible approach that accounts for measurement error in gestational age when making inference. The proposed approach is a hidden Markov model that accounts for measurement error in gestational age by exploiting the relationship between gestational age at birth and birth weight. We initially model the measurement error as being normally distributed, followed by a mixture of normals that has been proposed on the basis of biological considerations. We examine the asymptotic bias of the proposed approach when measurement error is ignored and also compare the efficiency of this approach to a simpler hidden Markov model formulation where only gestational age and not birth weight is incorporated. The proposed model is compared with alternative models for estimating important covariate effects on the risk of subsequent preterm birth using a unique set of data from the NICHD consecutive pregnancy study. 相似文献
8.
In a long-running longitudinal study using complex machinery to obtain measurements, it is sometimes necessary to replace the machine. This can result in lack of continuity in the measurements that can overwhelm any treatment effect or time trend. We propose a Bayesian procedure implemented using Markov chain Monte Carlo to calibrate the measurements on the old machine utilizing both person-specific and population information. The goal is to convert the previous measurements to values that can be treated as though they were made on the new machine. This methodology is applied to a bone mineral density study where the first densitometer uses gadolinium as the energy source (Lunar DP-3) and the second uses X-rays (Hologic QDR-1000W). Finally, simulation results are presented to show the superiority of the proposed method over existing methods of cross calibration. 相似文献
9.
Mapping complex traits or phenotypes with small genetic effects, whose phenotypes may be modulated by temporal trends in families are challenging. Detailed and accurate data must be available on families, whether or not the data were collected over time. Missing data complicate matters in pedigree analysis, especially in the case of a longitudinal pedigree analysis. Because most analytical methods developed for the analysis of longitudinal pedigree data require no missing data, the researcher is left with the option of dropping those cases (individuals) with missing data from the analysis or imputing values for the missing data. We present the use of data augmentation within Bayesian polygenic and longitudinal polygenic models to produce k complete datasets. The data augmentation, or imputation step of the Markov chain Monte Carlo, takes into account the observed familial information and the observed subject information available at other time points. These k complete datasets can then be used to fit single time point or longitudinal pedigree models. By producing a set of k complete datasets and thus k sets of parameter estimates, the total variance associated with an estimate can be partitioned into a within-imputation and a between-imputation component. The method is illustrated using the Genetic Analysis Workshop simulated data. 相似文献
10.
Profiling health care providers for the purpose of public reporting and quality improvement has become commonplace. Recently, the Centers for Medicare and Medicaid Services (CMS) began publishing measures of quality for every Medicare/Medicaid-certified nursing home in the country. The facility-specific quality indicators (QIs) reported by CMS are based on quarterly measures from the minimum data set (MDS). However, some QIs from the MDS are potentially subject to ascertainment bias. Ascertainment bias would occur if there was variation in the way items that make up QIs are measured by nurses from each facility. This is potentially a problem for difficult-to-measure items such as pain and pressure ulcers. To assess the impact of ascertainment bias on profiling, we utilize data from a reliability study of nursing homes from six states. We develop methods for profiling providers in situations where the data consist of a response variable for each subject based on assessments from an internal rater, and, for a subset of subjects in each facility, a response variable based on assessments from an independent (external) rater. The internal assessments are potentially subject to provider-level ascertainment bias, whereas the independent assessments are considered the 'gold standard'. Our methods extend popular Bayesian approaches for profiling by using the paired observations from the subset of subjects with error-prone and error-free assessments to adjust for ascertainment bias. We apply the methods to MDS merged with the reliability data, and compare the bias-corrected profiles with those of standard approaches. 相似文献
11.
Many statistical methods have been developed that treat within‐subject correlation that accompanies the clustering of subjects in longitudinal data settings as a nuisance parameter, with the focus of analytic interest being on mean outcome or profiles over time. However, there is evidence that in certain settings, underlying variability in subject measures may also be important in predicting future health outcomes of interest. Here, we develop a method for combining information from mean profiles and residual variance to assess associations with categorical outcomes in a joint modeling framework. We consider an application to relating word recall measures obtained over time to dementia onset from the Health and Retirement Survey. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
12.
Paul S. Albert 《Statistics in medicine》2019,38(2):175-183
Naturalistic driving studies provide opportunities for investigating the effects of key driving exposures on risky driving performance and accidents. New technology provides a realistic assessment of risky driving through the intensive monitoring of kinematic behavior while driving. These studies with their complex data structures provide opportunities for statisticians to develop needed modeling techniques for statistical inference. This article discusses new statistical modeling procedures that were developed to specifically answer important analytical questions for naturalistic driving studies. However, these methodologies also have important applications for the analysis of intensively collected longitudinal data, an increasingly common data structure with the advent of wearable devises. To examine the sources of variation between- and within-participants in risky driving behavior, we explore the use of generalized linear mixed models with autoregressive random processes to analyzing long sequences of kinematic count data from a group of teenagers that have measurements at each trip over a 1.5-year observation period starting after receiving their license. These models provide a regression framework for examining the effects of driving conditions and exposures on risky driving behavior. Alternatively, generalized estimating equations approaches are explored for the situation where we have intensively collected count measurements on a moderate number of participants. In addition to proposing statistical modeling for kinematic events, we explore models for relating kinematic events with crash risk. Specifically, we propose both latent variable and hidden Markov models for relating these 2 processes and for developing dynamic predictors of crash risk from longitudinal kinematic event data. These different statistical modeling techniques are all used to analyze data from the Naturalistic Teenage Driving Study, a unique investigation into how teenagers drive after licensure. 相似文献
13.
Riddhi Pratim Ghosh Arnab K. Maity Mohsen Pourahmadi Bani K. Mallick 《Genetic epidemiology》2023,47(1):95-104
The clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle-based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize a large number of clusters) as prior on the number of clusters. The posterior distributions of the parameters are not in explicit form and we use a reversible jump Markov chain Monte Carlo based technique is used to simulate the parameters from the posteriors. The end products of the proposed method are estimated cluster configuration of the proteins (variables) along with the number of clusters. The Bayesian method is flexible enough to cluster the proteins as well as estimate the number of clusters. The performance of the proposed method has been substantiated with extensive simulation studies and one protein expression data with a hereditary disposition in breast cancer where the proteins are coming from different pathways. 相似文献
14.
We present methods for binomial regression when the outcome is determined using the results of a single diagnostic test with imperfect sensitivity and specificity. We present our model, illustrate it with the analysis of real data, and provide an example of WinBUGS program code for performing such an analysis. Conditional means priors are used in order to allow for inclusion of prior data and expert opinion in the estimation of odds ratios, probabilities, risk ratios, risk differences, and diagnostic test sensitivity and specificity. A simple method of obtaining Bayes factors for link selection is presented. Methods are illustrated and compared with Bayesian ordinary binary regression using data from a study of the effectiveness of a smoking cessation program among pregnant women. Regression coefficient estimates are shown to change noticeably when expert prior knowledge and imperfect sensitivity and specificity are incorporated into the model. 相似文献
15.
Misclassification in a binary exposure variable within an unmatched prospective study may lead to a biased estimate of the disease-exposure relationship. It usually gives falsely small credible intervals because uncertainty in the recorded exposure is not taken into account. When there are several other perfectly measured covariates, interrelationships may introduce further potential for bias. Bayesian methods are proposed for analysing binary outcome studies in which an exposure variable is sometimes misclassified, but its correct values have been validated for a random subsample of the subjects. This Bayesian approach can model relationships between explanatory variables and between exploratory variables and the probabilities of misclassification. Three logistic regressions are used to relate disease to true exposure, misclassified exposure to true exposure and true exposure to other covariates. Credible intervals may be used to make decisions about whether certain parameters are unnecessary and hence whether the model can be reduced in complexity.In the disease-exposure model, for parameters representing coefficients related to perfectly measured covariates, the precision of posterior estimates is only slightly lower than would be found from data with no misclassification. For the risk factor which has misclassification, the estimates of model coefficients obtained are much less biased than those with misclassification ignored. 相似文献
16.
Bayesian monotonic errors‐in‐variables models with applications to pathogen susceptibility testing 下载免费PDF全文
Drug dilution (MIC) and disk diffusion (DIA) are the 2 most common antimicrobial susceptibility assays used by hospitals and clinics to determine an unknown pathogen's susceptibility to various antibiotics. Since only one assay is commonly used, it is important that the 2 assays give similar results. Calibration of the DIA assay to the MIC assay is typically done using the error‐rate bounded method, which selects DIA breakpoints that minimize the observed discrepancies between the 2 assays. In 2000, Craig proposed a model‐based approach that specifically models the measurement error and rounding processes of each assay, the underlying pathogen distribution, and the true monotonic relationship between the 2 assays. The 2 assays are then calibrated by focusing on matching the probabilities of correct classification (susceptible, indeterminant, and resistant). This approach results in greater precision and accuracy for estimating DIA breakpoints. In this paper, we expand the flexibility of the model‐based method by introducing a Bayesian 4‐parameter logistic model (extending Craig's original 3‐parameter model) as well as a Bayesian nonparametric spline model to describe the relationship between the 2 assays. We propose 2 ways to handle spline knot selection, considering many equally spaced knots but restricting overfitting via a random walk prior and treating the number and location of knots as additional unknown parameters. We demonstrate the 2 approaches via a series of simulation studies and apply the methods to 2 real data sets. 相似文献
17.
Prostate cancer is one of the most common cancers in American men. The cancer could either be locally confined, or it could spread outside the organ. When locally confined, there are several options for treating and curing this disease. Otherwise, surgery is the only option, and in extreme cases of outside spread, it could very easily recur within a short time even after surgery and subsequent radiation therapy. Hence, it is important to know, based on pre-surgery biopsy results how likely the cancer is organ-confined or not.The paper considers a hierarchical Bayesian neural network approach for posterior prediction probabilities of certain features indicative of non-organ confined prostate cancer. In particular, we find such probabilities for margin positivity (MP) and seminal vesicle (SV) positivity jointly. The available training set consists of bivariate binary outcomes indicating the presence or absence of the two. In addition, we have certain covariates such as prostate specific antigen (PSA), gleason score and the indicator for the cancer to be unilateral or bilateral (i.e. spread on one or both sides) in one data set and gene expression microarrays in another data set. We take a hierarchical Bayesian neural network approach to find the posterior prediction probabilities for a test and validation set, and compare these with the actual outcomes for the first data set. In case of the microarray data we use leave one out cross-validation to access the accuracy of our method. We also demonstrate the superiority of our method to the other competing methods through a simulation study. The Bayesian procedure is implemented by an application of the Markov chain Monte Carlo numerical integration technique. For the problem at hand, our Bayesian bivariate neural network procedure is shown to be superior to the classical neural network, Radford Neal's Bayesian neural network as well as bivariate logistic models to predict jointly the MP and SV in a patient in both the data sets as well as in the simulation study. 相似文献
18.
The transmission/disequilibrium test (TDT) for binary traits is a powerful method for detecting linkage between a marker locus and a trait locus in the presence of allelic association. The TDT uses information on the parent-to-offspring transmission status of the associated allele at the marker locus to assess linkage or association in the presence of the other, using one affected offspring from each set of parents. For testing for linkage in the presence of association, more than one offspring per family can be used. However, without incorporating the correlation structure among offspring, it is not possible to correctly assess the association in the presence of linkage. In this presentation, we propose a Bayesian TDT method as a complementary alternative to the classical approach. In the hypothesis testing setup, given two competing hypotheses, the Bayes factor can be used to weigh the evidence in favor of one of them, thus allowing us to decide between the two hypotheses using established criteria. We compare the proposed Bayesian TDT with a competing frequentist-testing method with respect to power and type I error validity. If we know the mode of inheritance of the disease, then the joint and marginal posterior distributions for the recombination fraction (theta) and disequilibrium coefficient (delta) can be obtained via standard MCMC methods, which lead naturally to Bayesian credible intervals for both parameters. 相似文献
19.
In many areas of medical research, 'gold standard' diagnostic tests do not exist and so evaluating the performance of standardized diagnostic criteria or algorithms is problematic. In this paper we propose an approach to evaluating the operating characteristics of diagnoses using a latent class model. By defining 'true disease' as our latent variable, we are able to estimate sensitivity, specificity and negative and positive predictive values of the diagnostic test. These methods are applied to diagnostic criteria for depression using Baltimore's Epidemiologic Catchment Area Study Wave 3 data. 相似文献
20.
In tumour xenograft experiments, treatment regimens are administered, and the tumour volume of each individual is measured repeatedly over time. Survival data are recorded because of the death of some individuals during the observation period. Also, cure data are observed because of a portion of individuals who are completely cured in the experiments. When modelling these data, certain constraints have to be imposed on the parameters in the models to account for the intrinsic growth of the tumour in the absence of treatment. Also, the likely inherent association of longitudinal and survival‐cure data has to be taken into account in order to obtain unbiased estimators of parameters. In this paper, we propose such models for the joint modelling of longitudinal and survival‐cure data arising in xenograft experiments. Estimators of parameters in the joint models are obtained using a Markov chain Monte Carlo approach. Real data analysis of a xenograft experiment is carried out, and simulation studies are also conducted, showing that the proposed joint modelling approach outperforms the separate modelling methods in the sense of mean squared errors. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献