首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
In longitudinal studies, missing observations occur commonly. It has been well known that biased results could be produced if missingness is not properly handled in the analysis. Authors have developed many methods with the focus on either incomplete response or missing covariate observations, but rarely on both. The complexity of modeling and computational difficulty would be the major challenges in handling missingness in both response and covariate variables. In this paper, we develop methods using the pairwise likelihood formulation to handle longitudinal binary data with missing observations present in both response and covariate variables. We propose a unified framework to accommodate various types of missing data patterns. We evaluate the performance of the methods empirically under a variety of circumstances. In particular, we investigate issues on efficiency and robustness. We analyze longitudinal data from the National Population Health Study with the use of our methods. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
The assessment of continuous covariates singly as possible predictors in a multivariable logistic regression model is an important first step in the analysis. An approach to plotting which uses a cusum (cumulative sum) of the binary response variable is described. Extreme-deviation statistics associated with the cusum may be used to detect monotonic and non-monotonic trends. Probability plots of the covariate in the two groups defined by the response variable may help to determine the appropriate scale (transformation) of the covariate and to anticipate possible problems with the logistic fit. The ratio of the variances in the response/non-response groups is informative about the need for a quadratic term in the logistic model. Smoothed scatterplots of the response are valuable in displaying the observed and fitted values. The techniques are illustrated with two data sets.  相似文献   

3.
Misclassification in a binary exposure variable within an unmatched prospective study may lead to a biased estimate of the disease-exposure relationship. It usually gives falsely small credible intervals because uncertainty in the recorded exposure is not taken into account. When there are several other perfectly measured covariates, interrelationships may introduce further potential for bias. Bayesian methods are proposed for analysing binary outcome studies in which an exposure variable is sometimes misclassified, but its correct values have been validated for a random subsample of the subjects. This Bayesian approach can model relationships between explanatory variables and between exploratory variables and the probabilities of misclassification. Three logistic regressions are used to relate disease to true exposure, misclassified exposure to true exposure and true exposure to other covariates. Credible intervals may be used to make decisions about whether certain parameters are unnecessary and hence whether the model can be reduced in complexity.In the disease-exposure model, for parameters representing coefficients related to perfectly measured covariates, the precision of posterior estimates is only slightly lower than would be found from data with no misclassification. For the risk factor which has misclassification, the estimates of model coefficients obtained are much less biased than those with misclassification ignored.  相似文献   

4.
Maximum likelihood methods are used to incorporate partially observed covariate values in fitting logistic regression models. We extend these methods to data collected through complex surveys using the pseudo-likelihood approach. One can obtain parameter estimates of the logistic regression model using standard statistical software and their standard errors by Taylor series expansion or the jackknife method. We apply the approach to data from a two-phase survey screening for dementia in a community sample of African Americans age 65 and older living in Indianapolis. The binary response variable is dementia and the covariate with missing values is a daily functioning score collected from interviews with a relative of the study subject. © 1997 John Wiley & Sons, Ltd.  相似文献   

5.
We describe a methodology for analysing transitions over time in a binary outcome variable that is subject to misclassification (that is, measurement error). Logistic regression models for transition events in the true underlying state are combined with estimates of probabilities of misclassification of the underlying state. The model is based on the Markovian assumption that the probabilities of transition in the underlying state at a given time depend only on the underlying state at the previous time. Hence we estimate odds-ratio effects for transitions that are adjusted for the effect of misclassification. Comparing these adjusted estimates with estimates that are obtained without taking misclassification into account indicates that the latter can be biased either toward or away from the null. For the estimates to exist, certain restrictions on the observed data and misclassification probabilities need to be met. If these restrictions are not satisfied then the conclusion from the analysis is that all observed transition events can be explained solely by the error in outcome assessment, in which case it is likely that an aspect of the model is incorrect. The motivation for this work comes from an analysis of transitions in depression status for a cohort of Australian teenagers participating in a longitudinal study of adolescent health.  相似文献   

6.
This research is motivated by studying the progression of age‐related macular degeneration where both a covariate and the response variable are subject to censoring. We develop a general framework to handle regression with censored covariate where the response can be different types and the censoring can be random or subject to (constant) detection limits. Multiple imputation is a popular technique to handle missing data that requires compatibility between the imputation model and the substantive model to obtain valid estimates. With censored covariate, we propose a novel multiple imputation‐based approach, namely, the semiparametric two‐step importance sampling imputation (STISI) method, to impute the censored covariate. Specifically, STISI imputes the missing covariate from a semiparametric accelerated failure time model conditional on fully observed covariates (Step 1) with the acceptance probability derived from the substantive model (Step 2). The 2‐step procedure automatically ensures compatibility and takes full advantage of the relaxed semiparametric assumption in the imputation. Extensive simulations demonstrate that the STISI method yields valid estimates in all scenarios and outperforms some existing methods that are commonly used in practice. We apply STISI on data from the Age‐related Eye Disease Study, to investigate the association between the progression time of the less severe eye and that of the more severe eye. We also illustrate the method by analyzing the urine arsenic data for patients from National Health and Nutrition Examination Survey (2003‐2004) where the response is binary and 1 covariate is subject to detection limit.  相似文献   

7.
The Michigan Female Health Study (MFHS) conducted research focusing on reproductive health outcomes among women exposed to polybrominated biphenyls (PBBs). In the work presented here, the available longitudinal serum PBB exposure measurements are used to obtain predictions of PBB exposure for specific time points of interest via random effects models. In a two‐stage approach, a prediction of the PBB exposure is obtained and then used in a second‐stage health outcome model. This paper illustrates how a unified approach, which links the exposure and outcome in a joint model, provides an efficient adjustment for covariate measurement error. We compare the use of empirical Bayes predictions in the two‐stage approach with results from a joint modeling approach, with and without an adjustment for left‐ and interval‐censored data. The unified approach with the adjustment for left‐ and interval‐censored data resulted in little bias and near‐nominal confidence interval coverage in both the logistic and linear model setting. Published in 2010 by John Wiley & Sons, Ltd.  相似文献   

8.
Outcome‐dependent sampling (ODS) scheme is a cost‐effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well‐known such design is the case‐control design for binary response, the case‐cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under‐developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome‐dependent sampling (multivariate‐ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate‐ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple‐random‐sample portion of the multivariate‐ODS or the estimator from a simple random sample with the same sample size. The multivariate‐ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
In this paper we consider longitudinal studies in which the outcome to be measured over time is binary, and the covariates of interest are categorical. In longitudinal studies it is common for the outcomes and any time-varying covariates to be missing due to missed study visits, resulting in non-monotone patterns of missingness. Moreover, the reasons for missed visits may be related to the specific values of the response and/or covariates that should have been obtained, i.e. missingness is non-ignorable. With non-monotone non-ignorable missing response and covariate data, a full likelihood approach is quite complicated, and maximum likelihood estimation can be computationally prohibitive when there are many occasions of follow-up. Furthermore, the full likelihood must be correctly specified to obtain consistent parameter estimates. We propose a pseudo-likelihood method for jointly estimating the covariate effects on the marginal probabilities of the outcomes and the parameters of the missing data mechanism. The pseudo-likelihood requires specification of the marginal distributions of the missingness indicator, outcome, and possibly missing covariates at each occasions, but avoids making assumptions about the joint distribution of the data at two or more occasions. Thus, the proposed method can be considered semi-parametric. The proposed method is an extension of the pseudo-likelihood approach in Troxel et al. to handle binary responses and possibly missing time-varying covariates. The method is illustrated using data from the Six Cities study, a longitudinal study of the health effects of air pollution.  相似文献   

10.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
We discuss maximum likelihood methods for analysing binary responses measured at two times, such as in a cross-over design. We construct a 2 x 2 table for each individual with cell probabilities corresponding to the cross-classification of the responses at the two times; the underlying likelihood for each individual is multinomial with four cells. The three dimensional parameter space of the multinomial distribution is completely specified by the two marginal probabilities of success of the 2 x 2 table and an association parameter between the binary responses at the two times. We examine a logistic model for the marginal probabilities of the 2 x 2 table for individual i; the association parameters we consider are either the correlation coefficient, the odds ratio or the relative risk. Simulations show that the parameter estimates for the logistic regression model for the marginal probabilities are not very sensitive to the parameters used to describe the association between the binary responses at the two times. Thus, we suggest choosing the measure of association for ease of interpretation.  相似文献   

12.
BACKGROUND: The multinomial logistic regression model is employed to model the relationship between an outcome variable with more than two categories and a set of covariates. This model is not widely used in epidemiology. We discuss the value of the multinomial model by comparing it with the binary logistic model, and we present a statistical comparison of odds ratios (OR) using the multinomial model. We studied the associations between obstetric history and very (< 33 weeks of amenorrhea) and moderate (33-36 weeks) preterm births. METHODS: Parameters (lnOR) of very and moderate preterm births, associated with the severity of obstetric history (none=0, moderate=1, severe=2), were estimated using two logistic binary models (moderate preterm births vs full-term births (>=37 weeks), and very preterm births vs full-term births) and one logistic multinomial model which compared very and moderate preterm births to full-term births. These analyses were performed before and after adjustment for a covariate: the country of survey. Parameters of very preterm birth and moderate preterm birth, estimated from multinomial model, were compared using Wald test. These analyses were performed using data from a large case-control survey in Europe, the EUROPOP survey; 1 675 very preterm births, 3 652 moderate preterm births and 7 965 full-term births were included. RESULTS: Crude parameters of very and moderate preterm births were similar, regardless the logistic regression model, binary or multinomial. The estimated parameters slightly differ after adjustment for the covariate, but lower variance estimates were obtained using multinomial logistic regression model. Parameters of very preterm birth associated with moderate obstetric history, B(gp)=0.5040, and severe obstetric history, B(gp)'=1.545, differ significantly from those of moderate preterm birth, B(pm)=0.4434 and B(pm)'=1.223 respectively (p < 0.001). CONCLUSION: Parameters obtained in separate logistic binary models are close to those obtained in a multinomial model. The multinomial model is useful for testing the heterogeneity of risk factors for distinct health problems.  相似文献   

13.
It is increasingly of interest in statistical genetics to test for the presence of an additive interaction between genetic (G) and environmental (E) risk factors. In case-control studies involving a rare disease, a statistical test of no additive G×E interaction typically entails a test of no relative excess risk due to interaction (RERI). It has been shown that a likelihood ratio test of a null RERI incorporating the G-E independence assumption (RERI-LRT) outperforms the standard approach. The RERI-LRT relies on correct specification of a logistic model for the binary outcome, as a function of G, E, and auxiliary covariates. However, when at least one exposure is not categorical or auxiliary covariates are present, nonparametric estimation may not be feasible, while parametric logistic regression will a priori rule out the null hypothesis of no additive interaction in most practical situations, inflating type I error rate. In this paper, we present a general approach to test for G × E additive interaction exploiting G-E independence. Unlike the RERI-LRT, it allows the regression model for the binary outcome to remain unrestricted, and nonetheless still allows for covariate adjustment in order to ensure the G-E independence assumption or to rule out residual confounding. The methods are illustrated through extensive simulation studies and an ovarian cancer study.  相似文献   

14.
The cumulative logit or the proportional odds regression model is commonly used to study covariate effects on ordinal responses. This paper provides some graphical and numerical methods for checking the adequacy of the proportional odds regression model. The methods focus on evaluating functional misspecification for specific covariate effects, but misspecification of the link function can also be dealt with under the same framework. For the logistic regression model with binary responses, Arbogast and Lin (Statist. Med. 2005; 24:229-247) developed similar graphical and numerical methods for assessing the adequacy of the model using the cumulative sums of residuals. The paper generalizes their methods to ordinal responses and illustrates them using an example from the VA Normative Aging Study. Simulation studies comparing the performance of the different diagnostic methods indicate that some of the graphical methods are more powerful in detecting model misspecification than the Hosmer-Lemeshow-type goodness-of-fit statistics for the class of models studied.  相似文献   

15.
The potential for bias due to misclassification error in regression analysis is well understood by statisticians and epidemiologists. Assuming little or no available data for estimating misclassification probabilities, investigators sometimes seek to gauge the sensitivity of an estimated effect to variations in the assumed values of those probabilities. We present an intuitive and flexible approach to such a sensitivity analysis, assuming an underlying logistic regression model. For outcome misclassification, we argue that a likelihood‐based analysis is the cleanest and the most preferable approach. In the case of covariate misclassification, we combine observed data on the outcome, error‐prone binary covariate of interest, and other covariates measured without error, together with investigator‐supplied values for sensitivity and specificity parameters, to produce corresponding positive and negative predictive values. These values serve as estimated weights to be used in fitting the model of interest to an appropriately defined expanded data set using standard statistical software. Jackknifing provides a convenient tool for incorporating uncertainty in the estimated weights into valid standard errors to accompany log odds ratio estimates obtained from the sensitivity analysis. Examples illustrate the flexibility of this unified strategy, and simulations suggest that it performs well relative to a maximum likelihood approach carried out via numerical optimization. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

16.
This study develops a two-part hidden Markov model (HMM) for analyzing semicontinuous longitudinal data in the presence of missing covariates. The proposed model manages a semicontinuous variable by splitting it into two random variables: a binary indicator for determining the occurrence of excess zeros at all occasions and a continuous random variable for examining its actual level. For the continuous longitudinal response, an HMM is proposed to describe the relationship between the observation and unobservable finite-state transition processes. The HMM consists of two major components. The first component is a transition model for investigating how potential covariates influence the probabilities of transitioning from one hidden state to another. The second component is a conditional regression model for examining the state-specific effects of covariates on the response. A shared random effect is introduced to each part of the model to accommodate possible unobservable heterogeneity among observation processes and the nonignorability of missing covariates. A Bayesian adaptive least absolute shrinkage and selection operator (lasso) procedure is developed to conduct simultaneous variable selection and estimation. The proposed methodology is applied to a study on the Alzheimer's Disease Neuroimaging Initiative dataset. New insights into the pathology of Alzheimer's disease and its potential risk factors are obtained.  相似文献   

17.
Missing outcomes are a commonly occurring problem for cluster randomised trials, which can lead to biased and inefficient inference if ignored or handled inappropriately. Two approaches for analysing such trials are cluster‐level analysis and individual‐level analysis. In this study, we assessed the performance of unadjusted cluster‐level analysis, baseline covariate‐adjusted cluster‐level analysis, random effects logistic regression and generalised estimating equations when binary outcomes are missing under a baseline covariate‐dependent missingness mechanism. Missing outcomes were handled using complete records analysis and multilevel multiple imputation. We analytically show that cluster‐level analyses for estimating risk ratio using complete records are valid if the true data generating model has log link and the intervention groups have the same missingness mechanism and the same covariate effect in the outcome model. We performed a simulation study considering four different scenarios, depending on whether the missingness mechanisms are the same or different between the intervention groups and whether there is an interaction between intervention group and baseline covariate in the outcome model. On the basis of the simulation study and analytical results, we give guidance on the conditions under which each approach is valid. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

18.
In the development of risk prediction models, predictors are often measured with error. In this paper, we investigate the impact of covariate measurement error on risk prediction. We compare the prediction performance using a costly variable measured without error, along with error‐free covariates, to that of a model based on an inexpensive surrogate along with the error‐free covariates. We consider continuous error‐prone covariates with homoscedastic and heteroscedastic errors, and also a discrete misclassified covariate. Prediction performance is evaluated by the area under the receiver operating characteristic curve (AUC), the Brier score (BS), and the ratio of the observed to the expected number of events (calibration). In an extensive numerical study, we show that (i) the prediction model with the error‐prone covariate is very well calibrated, even when it is mis‐specified; (ii) using the error‐prone covariate instead of the true covariate can reduce the AUC and increase the BS dramatically; (iii) adding an auxiliary variable, which is correlated with the error‐prone covariate but conditionally independent of the outcome given all covariates in the true model, can improve the AUC and BS substantially. We conclude that reducing measurement error in covariates will improve the ensuing risk prediction, unless the association between the error‐free and error‐prone covariates is very high. Finally, we demonstrate how a validation study can be used to assess the effect of mismeasured covariates on risk prediction. These concepts are illustrated in a breast cancer risk prediction model developed in the Nurses' Health Study. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

19.
The dependence of longitudinal binary outcomes on covariates and the covariation observed between them is often modelled by (multivariate) logistic and probit models, respectively, assuming specified association structure or random effects. Alternatively, latent class models may be used that capture the covariation by assuming heterogeneity of the observational units regarding their reaction tendencies while postulating independence within classes. In the presence of a few categorical covariates, the multi-group method of latent class analysis allows one to relate the class sizes and the class-specific response probabilities to these covariates. Wheeze data from the Harvard Six-Cities study on respiratory health are a typical example for such a situation: at four occasions, the wheeze status of 537 children was examined, 187 among them exposed to maternal smoking and 350 not exposed. Thus, there is a single binary covariate (maternal smoking versus no maternal smoking) making easily applicable the multi-group method of latent class analysis. Based on a series of unrestricted and restricted models having up to three classes for the exposed and not-exposed subgroup each, no statistically significant effect of maternal smoking on children's wheeze status could be substantiated. Moreover, it was not possible to show statistically significant difference at all between the two distributions of wheeze patterns collected from exposed and not-exposed children.  相似文献   

20.
When describing longitudinal binary response data, it may be desirable to estimate the cumulative probability of at least one positive response by some time point. For example, in phase I and II human immunodeficiency virus (HIV) vaccine trials, investigators are often interested in the probability of at least one vaccine-induced CD8+ cytotoxic T-lymphocyte (CTL) response to HIV proteins at different times over the course of the trial. In this setting, traditional estimates of the cumulative probabilities have been based on observed proportions. We show that if the missing data mechanism is ignorable, the traditional estimator of the cumulative success probabilities is biased and tends to underestimate a candidate vaccine's ability to induce CTL responses. As an alternative, we propose applying standard optimization techniques to obtain maximum likelihood estimates of the response profiles and, in turn, the cumulative probabilities of interest. Comparisons of the empirical and maximum likelihood estimates are investigated using data from simulations and HIV vaccine trials. We conclude that maximum likelihood offers a more accurate method of estimation, which is especially important in the HIV vaccine setting as cumulative CTL responses will likely be used as a key criterion for large scale efficacy trial qualification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号