首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We explore the ‘reassessment’ design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non‐ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
This work is motivated by dose‐finding studies, where the number of events per subject within a specified study period form the primary outcome. The aim of the considered studies is to identify the target dose for which the new drug can be shown to be as effective as a competitor medication. Given a pain‐related outcome, we expect a considerable number of patients to drop out before the end of the study period. The impact of missingness on the analysis and models for the missingness process must be carefully considered. The recurrent events are modeled as over‐dispersed Poisson process data, with dose as the regressor. Additional covariates may be included. Constant and time‐varying rate functions are examined. Based on these models, the impact of missingness on the precision of the target dose estimation is evaluated. Diverse models for the missingness process are considered, including dependence on covariates and number of events. The performances of five different analysis methods are assessed via simulations: a complete case analysis; two analyses using different single imputation techniques; a direct‐likelihood analysis and an analysis using pattern‐mixture models. The target dose estimation is robust if the same missingness process holds for the target dose group and the active control group. Furthermore, we demonstrate that this robustness is lost as soon as the missingness mechanisms for the active control and the target dose differ. Of the methods explored, the direct‐likelihood approach performs best, even when a missing not at random mechanism holds. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
We propose a joint model for longitudinal and survival data with time‐varying covariates subject to detection limits and intermittent missingness at random. The model is motivated by data from the Multicenter AIDS Cohort Study (MACS), in which HIV+ subjects have viral load and CD4 cell count measured at repeated visits along with survival data. We model the longitudinal component using a normal linear mixed model, modeling the trajectory of CD4 cell count by regressing on viral load, and other covariates. The viral load data are subject to both left censoring because of detection limits (17%) and intermittent missingness (27%). The survival component of the joint model is a Cox model with time‐dependent covariates for death because of AIDS. The longitudinal and survival models are linked using the trajectory function of the linear mixed model. A Bayesian analysis is conducted on the MACS data using the proposed joint model. The proposed method is shown to improve the precision of estimates when compared with alternative methods. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
In this paper we consider longitudinal studies in which the outcome to be measured over time is binary, and the covariates of interest are categorical. In longitudinal studies it is common for the outcomes and any time-varying covariates to be missing due to missed study visits, resulting in non-monotone patterns of missingness. Moreover, the reasons for missed visits may be related to the specific values of the response and/or covariates that should have been obtained, i.e. missingness is non-ignorable. With non-monotone non-ignorable missing response and covariate data, a full likelihood approach is quite complicated, and maximum likelihood estimation can be computationally prohibitive when there are many occasions of follow-up. Furthermore, the full likelihood must be correctly specified to obtain consistent parameter estimates. We propose a pseudo-likelihood method for jointly estimating the covariate effects on the marginal probabilities of the outcomes and the parameters of the missing data mechanism. The pseudo-likelihood requires specification of the marginal distributions of the missingness indicator, outcome, and possibly missing covariates at each occasions, but avoids making assumptions about the joint distribution of the data at two or more occasions. Thus, the proposed method can be considered semi-parametric. The proposed method is an extension of the pseudo-likelihood approach in Troxel et al. to handle binary responses and possibly missing time-varying covariates. The method is illustrated using data from the Six Cities study, a longitudinal study of the health effects of air pollution.  相似文献   

5.
Combining information from multiple data sources can enhance estimates of health‐related measures by using one source to supply information that is lacking in another, assuming the former has accurate and complete data. However, there is little research conducted on combining methods when each source might be imperfect, for example, subject to measurement errors and/or missing data. In a multisite study of hospice‐use by late‐stage cancer patients, this variable was available from patients’ abstracted medical records, which may be considerably underreported because of incomplete acquisition of these records. Therefore, data for Medicare‐eligible patients were supplemented with their Medicare claims that contained information on hospice‐use, which may also be subject to underreporting yet to a lesser degree. In addition, both sources suffered from missing data because of unit nonresponse from medical record abstraction and sample undercoverage for Medicare claims. We treat the true hospice‐use status from these patients as a latent variable and propose to multiply impute it using information from both data sources, borrowing the strength from each. We characterize the complete‐data model as a product of an ‘outcome’ model for the probability of hospice‐use and a ‘reporting’ model for the probability of underreporting from both sources, adjusting for other covariates. Assuming the reports of hospice‐use from both sources are missing at random and the underreporting are conditionally independent, we develop a Bayesian multiple imputation algorithm and conduct multiple imputation analyses of patient hospice‐use in demographic and clinical subgroups. The proposed approach yields more sensible results than alternative methods in our example. Our model is also related to dual system estimation in population censuses and dual exposure assessment in epidemiology. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

6.
Propensity scores have been used widely as a bias reduction method to estimate the treatment effect in nonrandomized studies. Since many covariates are generally included in the model for estimating the propensity scores, the proportion of subjects with at least one missing covariate could be large. While many methods have been proposed for propensity score‐based estimation in the presence of missing covariates, little has been published comparing the performance of these methods. In this article we propose a novel method called multiple imputation missingness pattern (MIMP) and compare it with the naive estimator (ignoring propensity score) and three commonly used methods of handling missing covariates in propensity score‐based estimation (separate estimation of propensity scores within each pattern of missing data, multiple imputation and discarding missing data) under different mechanisms of missing data and degree of correlation among covariates. Simulation shows that all adjusted estimators are much less biased than the naive estimator. Under certain conditions MIMP provides benefits (smaller bias and mean‐squared error) compared with existing alternatives. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
Phenotyping, ie, identification of patients possessing a characteristic of interest, is a fundamental task for research conducted using electronic health records. However, challenges to this task include imperfect sensitivity and specificity of clinical codes and inconsistent availability of more detailed data such as laboratory test results. Despite these challenges, most existing electronic health records–derived phenotypes are rule-based, consisting of a series of Boolean arguments informed by expert knowledge of the disease of interest and its coding. The objective of this paper is to introduce a Bayesian latent phenotyping approach that accounts for imperfect data elements and missing not at random missingness patterns that can be used when no gold-standard data are available. We conducted simulation studies to compare alternative phenotyping methods under different patterns of missingness and applied these approaches to a cohort of 68 265 children at elevated risk for type 2 diabetes mellitus (T2DM). In simulation studies, the latent class approach had similar sensitivity to a rule-based approach (95.9% vs 91.9%) while substantially improving specificity (99.7% vs 90.8%). In the PEDSnet cohort, we found that biomarkers and clinical codes were strongly associated with latent T2DM status. The latent T2DM class was also strongly predictive of missingness in biomarkers. Glucose was missing in 83.4% of patients (odds ratio for latent T2DM status = 0.52) while hemoglobin A1c was missing in 91.2% (odds ratio for latent T2DM status = 0.03 ), suggesting missing not at random missingness. The latent phenotype approach may substantially improve on rule-based phenotyping.  相似文献   

8.
Causal inference with observational longitudinal data and time‐varying exposures is complicated due to the potential for time‐dependent confounding and unmeasured confounding. Most causal inference methods that handle time‐dependent confounding rely on either the assumption of no unmeasured confounders or the availability of an unconfounded variable that is associated with the exposure (eg, an instrumental variable). Furthermore, when data are incomplete, validity of many methods often depends on the assumption of missing at random. We propose an approach that combines a parametric joint mixed‐effects model for the study outcome and the exposure with g‐computation to identify and estimate causal effects in the presence of time‐dependent confounding and unmeasured confounding. G‐computation can estimate participant‐specific or population‐average causal effects using parameters of the joint model. The joint model is a type of shared parameter model where the outcome and exposure‐selection models share common random effect(s). We also extend the joint model to handle missing data and truncation by death when missingness is possibly not at random. We evaluate the performance of the proposed method using simulation studies and compare the method to both linear mixed‐ and fixed‐effects models combined with g‐computation as well as to targeted maximum likelihood estimation. We apply the method to an epidemiologic study of vitamin D and depressive symptoms in older adults and include code using SAS PROC NLMIXED software to enhance the accessibility of the method to applied researchers.  相似文献   

9.
We propose a semiparametric marginal modeling approach for longitudinal analysis of cohorts with data missing due to death and non‐response to estimate regression parameters interpreted as conditioned on being alive. Our proposed method accommodates outcomes and time‐dependent covariates that are missing not at random with non‐monotone missingness patterns via inverse‐probability weighting. Missing covariates are replaced by consistent estimates derived from a simultaneously solved inverse‐probability‐weighted estimating equation. Thus, we utilize data points with the observed outcomes and missing covariates beyond the estimated weights while avoiding numerical methods to integrate over missing covariates. The approach is applied to a cohort of elderly female hip fracture patients to estimate the prevalence of walking disability over time as a function of body composition, inflammation, and age. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

10.
We present a case study in the analysis of the prognostic effects of anaemia and other covariates on the local recurrence of head and neck cancer in patients who have been treated with radiation therapy. Because it is believed that a large fraction of the patients are cured by the therapy, we use a failure time mixture model for the outcomes, which simultaneously models both the relationship of the covariates to cure and the relationship of the covariates to local recurrence times for subjects who are not cured. A problematic feature of the data is that two covariates of interest having missing values, so that only 75 per cent of the subjects have complete data. We handle the missing-data problem by jointly modelling the covariates and the outcomes, and then fitting the model to all of the data, including the incomplete cases. We compare our approach to two traditional methods for handling missingness, that is, complete-case analysis and the use of an indicator variable for missingness. The comparison with complete-case analysis demonstrates gains in efficiency for joint modelling as well as sensitivity of some results to the method used to handle missing data. The use of an indicator variable yields results that are very similar to those from joint modelling for our data. We also compare the results obtained for the mixture model with results obtained for a standard (non-mixture) survival model. It is seen that the mixture model separates out effects in a way that is not possible with a standard survival model. In particular, conditional on other covariates, we find strong evidence of an association between anaemia and cure, whereas the evidence of an association between anaemia and time to local recurrence for patients who are not cured is weaker.  相似文献   

11.
Latent trait shared-parameter mixed models for ecological momentary assessment (EMA) data containing missing values are developed in which data are collected in an intermittent manner. In such studies, data are often missing due to unanswered prompts. Using item response theory models, a latent trait is used to represent the missing prompts and modeled jointly with a mixed model for bivariate longitudinal outcomes. Both one- and two-parameter latent trait shared-parameter mixed models are presented. These new models offer a unique way to analyze missing EMA data with many response patterns. Here, the proposed models represent missingness via a latent trait that corresponds to the students' “ability” to respond to the prompting device. Data containing more than 10 300 observations from an EMA study involving high school students' positive and negative affects are presented. The latent trait representing missingness was a significant predictor of both positive affect and negative affect outcomes. The models are compared to a missing at random mixed model. A simulation study indicates that the proposed models can provide lower bias and increased efficiency compared to the standard missing at random approach commonly used with intermittent missing longitudinal data.  相似文献   

12.
Causal inference has been widely conducted in various fields and many methods have been proposed for different settings. However, for noisy data with both mismeasurements and missing observations, those methods often break down. In this paper, we consider a problem that binary outcomes are subject to both missingness and misclassification, when the interest is in estimation of the average treatment effects (ATE). We examine the asymptotic biases caused by ignoring missingness and/or misclassification and establish the intrinsic connections between missingness effects and misclassification effects on the estimation of ATE. We develop valid weighted estimation methods to simultaneously correct for missingness and misclassification effects. To provide protection against model misspecification, we further propose a doubly robust correction method which yields consistent estimators when either the treatment model or the outcome model is misspecified. Simulation studies are conducted to assess the performance of the proposed methods. An application to smoking cessation data is reported to illustrate the use of the proposed methods.  相似文献   

13.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

14.
Missing outcome data is a crucial threat to the validity of treatment effect estimates from randomized trials. The outcome distributions of participants with missing and observed data are often different, which increases bias. Causal inference methods may aid in reducing the bias and improving efficiency by incorporating baseline variables into the analysis. In particular, doubly robust estimators incorporate 2 nuisance parameters: the outcome regression and the missingness mechanism (ie, the probability of missingness conditional on treatment assignment and baseline variables), to adjust for differences in the observed and unobserved groups that can be explained by observed covariates. To consistently estimate the treatment effect, one of these nuisance parameters must be consistently estimated. Traditionally, nuisance parameters are estimated using parametric models, which often precludes consistency, particularly in moderate to high dimensions. Recent research on missing data has focused on data‐adaptive estimation to help achieve consistency, but the large sample properties of such methods are poorly understood. In this article, we discuss a doubly robust estimator that is consistent and asymptotically normal under data‐adaptive estimation of the nuisance parameters. We provide a formula for an asymptotically exact confidence interval under minimal assumptions. We show that our proposed estimator has smaller finite‐sample bias compared to standard doubly robust estimators. We present a simulation study demonstrating the enhanced performance of our estimators in terms of bias, efficiency, and coverage of the confidence intervals. We present the results of an illustrative example: a randomized, double‐blind phase 2/3 trial of antiretroviral therapy in HIV‐infected persons.  相似文献   

15.
Survival analysis has been conventionally performed on a continuous time scale. In practice, the survival time is often recorded or handled on a discrete scale; when this is the case, the discrete-time survival analysis would provide analysis results more relevant to the actual data scale. Besides, data on time-dependent covariates in the survival analysis are usually collected through intermittent follow-ups, resulting in the missing and mismeasured covariate data. In this work, we propose the sufficient discrete hazard (SDH) approach to discrete-time survival analysis with longitudinal covariates that are subject to missingness and mismeasurement. The SDH method employs the conditional score idea available for dealing with mismeasured covariates, and the penalized least squares for estimating the missing covariate value using the regression spline basis. The SDH method is developed for the single event analysis with the logistic discrete hazard model, and for the competing risks analysis with the multinomial logit model. Simulation results revel good finite-sample performances of the proposed estimator and the associated asymptotic theory. The proposed SDH method is applied to the scleroderma lung study data, where the time to medication withdrawal and time to death were recorded discretely in months, for illustration.  相似文献   

16.
Quality-of-life (QOL) is an important outcome in clinical research, particularly in cancer clinical trials. Typically, data are collected longitudinally from patients during treatment and subsequent follow-up. Missing data are a common problem, and missingness may arise in a non-ignorable fashion. In particular, the probability that a patient misses an assessment may depend on the patient's QOL at the time of the scheduled assessment. We propose a Markov chain model for the analysis of categorical outcomes derived from QOL measures. Our model assumes that transitions between QOL states depend on covariates through generalized logit models or proportional odds models. To account for non-ignorable missingness, we incorporate logistic regression models for the conditional probabilities of observing measurements, given their actual values. The model can accommodate time-dependent covariates. Estimation is by maximum likelihood, summing over all possible values of the missing measurements. We describe options for selecting parsimonious models, and we study the finite-sample properties of the estimators by simulation. We apply the techniques to data from a breast cancer clinical trial in which QOL assessments were made longitudinally, and in which missing data frequently arose.  相似文献   

17.
PurposeThe aim of this research was to examine, in an exploratory manner, whether cross-sectional multiple imputation generates valid parameter estimates for a latent growth curve model in a longitudinal data set with nonmonotone missingness.MethodsA simulated longitudinal data set of N = 5000 was generated and consisted of a continuous dependent variable, assessed at three measurement occasions and a categorical time-invariant independent variable. Missing data had a nonmonotone pattern and the proportion of missingness increased from the initial to the final measurement occasion (5%–20%). Three methods were considered to deal with missing data: listwise deletion, full-information maximum likelihood, and multiple imputation. A latent growth curve model was specified and analysis of variance was used to compare parameter estimates between the full data set and missing data approaches.ResultsMultiple imputation resulted in significantly lower slope variance compared with the full data set. There were no differences in any parameter estimates between the multiple imputation and full-information maximum likelihood approaches.ConclusionsThis study suggested that in longitudinal studies with nonmonotone missingness, cross-sectional imputation at each time point may be viable and produces estimates comparable with those obtained with full-information maximum likelihood. Future research pursuing the validity of this method is warranted.  相似文献   

18.
Multiple imputation is a popular method for addressing missing data, but its implementation is difficult when data have a multilevel structure and one or more variables are systematically missing. This systematic missing data pattern may commonly occur in meta‐analysis of individual participant data, where some variables are never observed in some studies, but are present in other hierarchical data settings. In these cases, valid imputation must account for both relationships between variables and correlation within studies. Proposed methods for multilevel imputation include specifying a full joint model and multiple imputation with chained equations (MICE). While MICE is attractive for its ease of implementation, there is little existing work describing conditions under which this is a valid alternative to specifying the full joint model. We present results showing that for multilevel normal models, MICE is rarely exactly equivalent to joint model imputation. Through a simulation study and an example using data from a traumatic brain injury study, we found that in spite of theoretical differences, MICE imputations often produce results similar to those obtained using the joint model. We also assess the influence of prior distributions in MICE imputation methods and find that when missingness is high, prior choices in MICE models tend to affect estimation of across‐study variability more than compatibility of conditional likelihoods. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

19.
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

20.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号