期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values

Ian R. White John B. Carlin 《Statistics in medicine》2010,29(28):2920-2931

When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

2.

Latent trait shared-parameter mixed models for missing ecological momentary assessment data

John F. Cursio Robin J. Mermelstein Donald Hedeker 《Statistics in medicine》2019,38(4):660-673

Latent trait shared-parameter mixed models for ecological momentary assessment (EMA) data containing missing values are developed in which data are collected in an intermittent manner. In such studies, data are often missing due to unanswered prompts. Using item response theory models, a latent trait is used to represent the missing prompts and modeled jointly with a mixed model for bivariate longitudinal outcomes. Both one- and two-parameter latent trait shared-parameter mixed models are presented. These new models offer a unique way to analyze missing EMA data with many response patterns. Here, the proposed models represent missingness via a latent trait that corresponds to the students' “ability” to respond to the prompting device. Data containing more than 10 300 observations from an EMA study involving high school students' positive and negative affects are presented. The latent trait representing missingness was a significant predictor of both positive affect and negative affect outcomes. The models are compared to a missing at random mixed model. A simulation study indicates that the proposed models can provide lower bias and increased efficiency compared to the standard missing at random approach commonly used with intermittent missing longitudinal data. 相似文献

3.

Pseudo-likelihood methods for longitudinal binary data with non-ignorable missing responses and covariates

Parzen M Lipsitz SR Fitzmaurice GM Ibrahim JG Troxel A 《Statistics in medicine》2006,25(16):2784-2796

In this paper we consider longitudinal studies in which the outcome to be measured over time is binary, and the covariates of interest are categorical. In longitudinal studies it is common for the outcomes and any time-varying covariates to be missing due to missed study visits, resulting in non-monotone patterns of missingness. Moreover, the reasons for missed visits may be related to the specific values of the response and/or covariates that should have been obtained, i.e. missingness is non-ignorable. With non-monotone non-ignorable missing response and covariate data, a full likelihood approach is quite complicated, and maximum likelihood estimation can be computationally prohibitive when there are many occasions of follow-up. Furthermore, the full likelihood must be correctly specified to obtain consistent parameter estimates. We propose a pseudo-likelihood method for jointly estimating the covariate effects on the marginal probabilities of the outcomes and the parameters of the missing data mechanism. The pseudo-likelihood requires specification of the marginal distributions of the missingness indicator, outcome, and possibly missing covariates at each occasions, but avoids making assumptions about the joint distribution of the data at two or more occasions. Thus, the proposed method can be considered semi-parametric. The proposed method is an extension of the pseudo-likelihood approach in Troxel et al. to handle binary responses and possibly missing time-varying covariates. The method is illustrated using data from the Six Cities study, a longitudinal study of the health effects of air pollution. 相似文献

4.

A comparison of hospital performance with non-ignorable missing covariates: an application to trauma care data

Kirkham JJ 《Statistics in medicine》2008,27(27):5725-5744

Trauma is a term used in medicine for describing physical injury. The prospective evaluation of the care of injured patients aims to improve the management of a trauma system and acts as an ongoing audit of trauma care. One of the principal techniques used to evaluate the effectiveness of trauma care at different hospitals is through a comparative outcome analysis. In such an analysis, a national 'league table' can be compiled to determine which hospitals are better at managing trauma care. One of the problems with the conventional analysis is that key covariates for measuring physiological injury can often be missing. It is also hypothesized that this missingness is not missing at random (NMAR). We describe the methods used to assess the performance of hospitals in a trauma setting and implement the method of weights for generalized linear models to account for the missing covariate data, when we suspect the missing data mechanism is NMAR using a Monte Carlo EM algorithm. Through simulation work and application to the trauma data we demonstrate the affect the missing covariate data can have on the performance of hospitals and how the conclusions we draw from the analysis can differ. We highlight the differences in hospital performance and the ranking of hospitals. 相似文献

5.

Joint Longitudinal Models for Dealing With Missing at Random Data in Trial-Based Economic Evaluations

Andrea Gabrio Rachael Hunter Alexina J. Mason Gianluca Baio 《Value in health》2021,24(5):699-706

ObjectivesIn trial-based economic evaluation, some individuals are typically associated with missing data at some time point, so that their corresponding aggregated outcomes (eg, quality-adjusted life-years) cannot be evaluated. Restricting the analysis to the complete cases is inefficient and can result in biased estimates, while imputation methods are often implemented under a missing at random (MAR) assumption. We propose the use of joint longitudinal models to extend standard approaches by taking into account the longitudinal structure to improve the estimation of the targeted quantities under MAR.MethodsWe compare the results from methods that handle missingness at an aggregated (case deletion, baseline imputation, and joint aggregated models) and disaggregated (joint longitudinal models) level under MAR. The methods are compared using a simulation study and applied to data from 2 real case studies.ResultsSimulations show that, according to which data affect the missingness process, aggregated methods may lead to biased results, while joint longitudinal models lead to valid inferences under MAR. The analysis of the 2 case studies support these results as both parameter estimates and cost-effectiveness results vary based on the amount of data incorporated into the model.ConclusionsOur analyses suggest that methods implemented at the aggregated level are potentially biased under MAR as they ignore the information from the partially observed follow-up data. This limitation can be overcome by extending the analysis to a longitudinal framework using joint models, which can incorporate all the available evidence. 相似文献

6.

Semiparametric regression models for repeated measures of mortal cohorts with non‐monotone missing outcomes and time‐dependent covariates

Michelle Shardell Gregory E. Hicks Ram R. Miller Jay Magaziner 《Statistics in medicine》2010,29(22):2282-2296

We propose a semiparametric marginal modeling approach for longitudinal analysis of cohorts with data missing due to death and non‐response to estimate regression parameters interpreted as conditioned on being alive. Our proposed method accommodates outcomes and time‐dependent covariates that are missing not at random with non‐monotone missingness patterns via inverse‐probability weighting. Missing covariates are replaced by consistent estimates derived from a simultaneously solved inverse‐probability‐weighted estimating equation. Thus, we utilize data points with the observed outcomes and missing covariates beyond the estimated weights while avoiding numerical methods to integrate over missing covariates. The approach is applied to a cohort of elderly female hip fracture patients to estimate the prevalence of walking disability over time as a function of body composition, inflammation, and age. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

7.

Combining multiple imputation and meta‐analysis with individual participant data

Stephen Burgess Ian R. White Matthieu Resche‐Rigon Angela M. Wood 《Statistics in medicine》2013,32(26):4499-4514

Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. 相似文献

8.

Causal inference with noisy data: Bias analysis and estimation approaches to simultaneously addressing missingness and misclassification in binary outcomes

Di Shu Grace Y. Yi 《Statistics in medicine》2020,39(4):456-468

Causal inference has been widely conducted in various fields and many methods have been proposed for different settings. However, for noisy data with both mismeasurements and missing observations, those methods often break down. In this paper, we consider a problem that binary outcomes are subject to both missingness and misclassification, when the interest is in estimation of the average treatment effects (ATE). We examine the asymptotic biases caused by ignoring missingness and/or misclassification and establish the intrinsic connections between missingness effects and misclassification effects on the estimation of ATE. We develop valid weighted estimation methods to simultaneously correct for missingness and misclassification effects. To provide protection against model misspecification, we further propose a doubly robust correction method which yields consistent estimators when either the treatment model or the outcome model is misspecified. Simulation studies are conducted to assess the performance of the proposed methods. An application to smoking cessation data is reported to illustrate the use of the proposed methods. 相似文献

9.

A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis

Wallace ML Anderson SJ Mazumdar S 《Statistics in medicine》2010,29(29):3004-3016

Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. 相似文献

10.

Missing data strategies for time-varying confounders in comparative effectiveness studies of non-missing time-varying exposures and right-censored outcomes

Manisha Desai Maria E. Montez-Rath Kristopher Kapphahn Vilija R. Joyce Maya B. Mathur Ariadna Garcia Natasha Purington Douglas K. Owens 《Statistics in medicine》2019,38(17):3204-3220

The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions. 相似文献

11.

Bayesian quantile regression‐based nonlinear mixed‐effects joint models for time‐to‐event and longitudinal data with multiple features

下载免费PDF全文

Yangxin Huang Jiaqing Chen 《Statistics in medicine》2016,35(30):5666-5685

This article explores Bayesian joint models for a quantile of longitudinal response, mismeasured covariate and event time outcome with an attempt to (i) characterize the entire conditional distribution of the response variable based on quantile regression that may be more robust to outliers and misspecification of error distribution; (ii) tailor accuracy from measurement error, evaluate non‐ignorable missing observations, and adjust departures from normality in covariate; and (iii) overcome shortages of confidence in specifying a time‐to‐event model. When statistical inference is carried out for a longitudinal data set with non‐central location, non‐linearity, non‐normality, measurement error, and missing values as well as event time with being interval censored, it is important to account for the simultaneous treatment of these data features in order to obtain more reliable and robust inferential results. Toward this end, we develop Bayesian joint modeling approach to simultaneously estimating all parameters in the three models: quantile regression‐based nonlinear mixed‐effects model for response using asymmetric Laplace distribution, linear mixed‐effects model with skew‐t distribution for mismeasured covariate in the presence of informative missingness and accelerated failure time model with unspecified nonparametric distribution for event time. We apply the proposed modeling approach to analyzing an AIDS clinical data set and conduct simulation studies to assess the performance of the proposed joint models and method. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

12.

Incorporating missingness for estimation of marginal regression models with multiple source predictors

Litman HJ Horton NJ Hernández B Laird NM 《Statistics in medicine》2007,26(5):1055-1068

Multiple informant data refers to information obtained from different individuals or sources used to measure the same construct; for example, researchers might collect information regarding child psychopathology from the child's teacher and the child's parent. Frequently, studies with multiple informants have incomplete observations; in some cases the missingness of informants is substantial. We introduce a Maximum Likelihood (ML) technique to fit models with multiple informants as predictors that permits missingness in the predictors as well as the response. We provide closed form solutions when possible and analytically compare the ML technique to the existing Generalized Estimating Equations (GEE) approach. We demonstrate that the ML approach can be used to compare the effect of the informants on response without standardizing the data. Simulations incorporating missingness show that ML is more efficient than the existing GEE method. In the presence of MCAR missing data, we find through a simulation study that the ML approach is robust to a relatively extreme departure from the normality assumption. We implement both methods in a study investigating the association between physical activity and obesity with activity measured using multiple informants (children and their mothers). 相似文献

13.

Sensitivity analysis to investigate the impact of a missing covariate on survival analyses using cancer registry data

Brian L. Egleston Yu‐Ning Wong 《Statistics in medicine》2009,28(10):1498-1511

Having substantial missing data is a common problem in administrative and cancer registry data. We propose a sensitivity analysis to evaluate the impact of a covariate that is potentially missing not at random in survival analyses using Weibull proportional hazards regressions. We apply the method to an investigation of the impact of missing grade on post‐surgical mortality outcomes in individuals with metastatic kidney cancer. Data came from the Surveillance Epidemiology and End Results (SEER) registry which provides population‐based information on those undergoing cytoreductive nephrectomy. Tumor grade is an important component of risk stratification for patients with both localized and metastatic kidney cancer. Many individuals in SEER with metastatic kidney cancer are missing tumor grade information. We found that surgery was protective, but that the magnitude of the effect depended on assumptions about the relationship of grade with missingness. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

14.

Analysis of non-ignorable missing and left-censored longitudinal data using a weighted random effects tobit model 总被引：1，自引：0，他引：1

Sattar A Weissfeld LA Molenberghs G 《Statistics in medicine》2011,30(27):3167-3180

In a longitudinal study with response data collected during a hospital stay, observations may be missing because of the subject's discharge from the hospital prior to completion of the study or the death of the subject, resulting in non-ignorable missing data. In addition to non-ignorable missingness, there is left-censoring in the response measurements because of the inherent limit of detection. For analyzing non-ignorable missing and left-censored longitudinal data, we have proposed to extend the theory of random effects tobit regression model to weighted random effects tobit regression model. The weights are computed on the basis of inverse probability weighted augmented methodology. An extensive simulation study was performed to compare the performance of the proposed model with a number of competitive models. The simulation study shows that the estimates are consistent and that the root mean square errors of the estimates are minimal for the use of augmented inverse probability weights in the random effects tobit model. The proposed method is also applied to the non-ignorable missing and left-censored interleukin-6 biomarker data obtained from the Genetic and Inflammatory Markers of Sepsis study. 相似文献

15.

Longitudinal latent variable models given incompletely observed biomarkers and covariates

Chunfeng Ren Yongyun Shin 《Statistics in medicine》2016,35(26):4729-4745

In this paper, we analyze a two‐level latent variable model for longitudinal data from the National Growth and Health Study where surrogate outcomes or biomarkers and covariates are subject to missingness at any of the levels. A conventional method for efficient handling of missing data is to re‐express the desired model as a joint distribution of variables, including the biomarkers, that are subject to missingness conditional on all of the covariates that are completely observed, and estimate the joint model by maximum likelihood, which is then transformed to the desired model. The joint model, however, identifies more parameters than desired, in general. We show that the over‐identified joint model produces biased estimation of the latent variable model and describe how to impose constraints on the joint model so that it has a one‐to‐one correspondence with the desired model for unbiased estimation. The constrained joint model handles missing data efficiently under the assumption of ignorable missing data and is estimated by a modified application of the expectation‐maximization algorithm. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

16.

Huazhen Lin Danping Liu Xiao‐Hua Zhou 《Statistics in medicine》2010,29(2):236-247

The missing data problem is common in longitudinal or hierarchical structure studies. In this paper, we propose a correlated random‐effects model to fit normal longitudinal or cluster data when the missingness mechanism is nonignorable. Computational challenges arise in the model fitting due to intractable numerical integrations. We obtain the estimates of the parameters based on an accurate approximation of the log likelihood, which has higher‐order accuracy but with less computational burden than the existing approximation. We apply the proposed method it to a real data set arising from an autism study. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

17.

Identification of the optimal treatment regimen in the presence of missing covariates

Ying Huang Xiao-Hua Zhou 《Statistics in medicine》2020,39(4):353-368

Covariates associated with treatment-effect heterogeneity can potentially be used to make personalized treatment recommendations towards best clinical outcomes. Methods for treatment-selection rule development that directly maximize treatment-selection benefits have attracted much interest in recent years, due to the robustness of these methods to outcome modeling. In practice, the task of treatment-selection rule development can be further complicated by missingness in data. Here, we consider the identification of optimal treatment-selection rules for a binary disease outcome when measurements of an important covariate from study participants are partly missing. Under the missing at random assumption, we develop a robust estimator of treatment-selection rules under the direct-optimization paradigm. This estimator targets the maximum selection benefits to the population under correct specification of at least one mechanism from each of the two sets—missing data or conditional covariate distribution, and treatment assignment or disease outcome model. We evaluate and compare performance of the proposed estimator with alternative direct-optimization estimators through extensive simulation studies. We demonstrate the application of the proposed method through a real data example from an Alzheimer's disease study for developing covariate combinations to guide the treatment of Alzheimer's disease. 相似文献

18.

Simple generalized estimating equations (GEEs) and weighted generalized estimating equations (WGEEs) in longitudinal studies with dropouts: guidelines and implementation in R

下载免费PDF全文

Alejandro Salazar Begoña Ojeda María Dueñas Fernando Fernández Inmaculada Failde 《Statistics in medicine》2016,35(19):3424-3448

Missing data are a common problem in clinical and epidemiological research, especially in longitudinal studies. Despite many methodological advances in recent decades, many papers on clinical trials and epidemiological studies do not report using principled statistical methods to accommodate missing data or use ineffective or inappropriate techniques. Two refined techniques are presented here: generalized estimating equations (GEEs) and weighted generalized estimating equations (WGEEs). These techniques are an extension of generalized linear models to longitudinal or clustered data, where observations are no longer independent. They can appropriately handle missing data when the missingness is completely at random (GEE and WGEE) or at random (WGEE) and do not require the outcome to be normally distributed. Our aim is to describe and illustrate with a real example, in a simple and accessible way to researchers, these techniques for handling missing data in the context of longitudinal studies subject to dropout and show how to implement them in R. We apply them to assess the evolution of health‐related quality of life in coronary patients in a data set subject to dropout. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

19.

Adjusting for partially missing baseline measurements in randomized trials

White IR Thompson SG 《Statistics in medicine》2005,24(7):993-1007

Adjustment for baseline variables in a randomized trial can increase power to detect a treatment effect. However, when baseline data are partly missing, analysis of complete cases is inefficient. We consider various possible improvements in the case of normally distributed baseline and outcome variables. Joint modelling of baseline and outcome is the most efficient method. Mean imputation is an excellent alternative, subject to three conditions. Firstly, if baseline and outcome are correlated more than about 0.6 then weighting should be used to allow for the greater information from complete cases. Secondly, imputation should be carried out in a deterministic way, using other baseline variables if possible, but not using randomized arm or outcome. Thirdly, if baselines are not missing completely at random, then a dummy variable for missingness should be included as a covariate (the missing indicator method). The methods are illustrated in a randomized trial in community psychiatry. 相似文献

20.

A tractable method to account for high-dimensional nonignorable missing data in intensive longitudinal data

Chengbo Yuan Donald Hedeker Robin Mermelstein Hui Xie 《Statistics in medicine》2020,39(20):2589-2605

Despite the need for sensitivity analysis to nonignorable missingness in intensive longitudinal data (ILD), such analysis is greatly hindered by novel ILD features, such as large data volume and complex nonmonotonic missing-data patterns. Likelihood of alternative models permitting nonignorable missingness often involves very high-dimensional integrals, causing curse of dimensionality and rendering solutions computationally prohibitive to obtain. We aim to overcome this challenge by developing a computationally feasible method, nonlinear indexes of local sensitivity to nonignorability (NISNI). We use linear mixed effects models for the incomplete outcome and covariates. We use Markov multinomial models to describe complex missing-data patterns and mechanisms in ILD, thereby permitting missingness probabilities to depend directly on missing data. Using a second-order Taylor series to approximate likelihood under nonignorability, we develop formulas and closed-form expressions for NISNI. Our approach permits the outcome and covariates to be missing simultaneously, as is often the case in ILD, and can capture U-shaped impact of nonignorability in the neighborhood of the missing at random model without fitting alternative models or evaluating integrals. We evaluate performance of this method using simulated data and real ILD collected by the ecological momentary assessment method. 相似文献