首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

2.

Background

In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates.

Methods

Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist.

Results

CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach.

Conclusions

Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions.  相似文献   

3.
ObjectiveMissing indicator method (MIM) and complete case analysis (CC) are frequently used to handle missing confounder data. Using empirical data, we demonstrated the degree and direction of bias in the effect estimate when using these methods compared with multiple imputation (MI).Study Design and SettingFrom a cohort study, we selected an exposure (marital status), outcome (depression), and confounders (age, sex, and income). Missing values in “income” were created according to different patterns of missingness: missing values were created completely at random and depending on exposure and outcome values. Percentages of missing values ranged from 2.5% to 30%.ResultsWhen missing values were completely random, MIM gave an overestimation of the odds ratio, whereas CC and MI gave unbiased results. MIM and CC gave under- or overestimations when missing values depended on observed values. Magnitude and direction of bias depended on how the missing values were related to exposure and outcome. Bias increased with increasing percentage of missing values.ConclusionMIM should not be used in handling missing confounder data because it gives unpredictable bias of the odds ratio even with small percentages of missing values. CC can be used when missing values are completely random, but it gives loss of statistical power.  相似文献   

4.
The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions.  相似文献   

5.
The generalized estimating equations (GEE) approach is commonly used to model incomplete longitudinal binary data. When drop-outs are missing at random through dependence on observed responses (MAR), GEE may give biased parameter estimates in the model for the marginal means. A weighted estimating equations approach gives consistent estimation under MAR when the drop-out mechanism is correctly specified. In this approach, observations or person-visits are weighted inversely proportional to their probability of being observed. Using a simulation study, we compare the performance of unweighted and weighted GEE in models for time-specific means of a repeated binary response with MAR drop-outs. Weighted GEE resulted in smaller finite sample bias than GEE. However, when the drop-out model was misspecified, weighted GEE sometimes performed worse than GEE. Weighted GEE with observation-level weights gave more efficient estimates than a weighted GEE procedure with cluster-level weights.  相似文献   

6.
A popular method for analysing repeated‐measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse‐probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of being observed, according to some assumed missingness model. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model. WGEE are consistent when the data are MAR and the dropout model is correctly specified. Imputation methods are consistent when the data are MAR and the imputation model is correctly specified. Recently, doubly robust (DR) methods have been developed. These involve both a model for probability of missingness and an imputation model for the expectation of each missing observation, and are consistent when either is correct. We describe DR GEE, and illustrate their use on simulated data. We also analyse the INITIO randomized clinical trial of HIV therapy allowing for MAR dropout. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
目的 比较在处理多种缺失机制共存的定量纵向缺失数据时,基于对照的模式混合模型(PMM)、重复测量的混合效应模型(MMRM)以及多重填补法(MI)的统计性能。方法 采用Monte Carlo技术模拟产生包含完全随机缺失、随机缺失和非随机缺失中两种或三种缺失机制的定量纵向缺失数据集,评价三类处理方法的统计性能。结果 基于对照的PMM控制Ⅰ类错误率在较低水平,检验效能最低。MMRM和MI的Ⅰ类错误率可控,检验效能高于基于对照的PMM。两组疗效无差异的情况下,所有方法的估计误差相当,基于对照的PMM方法的95%置信区间覆盖率最高;有差异的情况下,各方法受符合其缺失机制假设的缺失比例大小影响。含有非随机缺失数据时,基于对照的PMM基本不高估疗效差异,95%置信区间覆盖率最高,MMRM和MI高估疗效差异,95%置信区间覆盖率较低。所有方法的95%置信区间宽度相当。结论 分析多种缺失机制共存,特别是含有非随机缺失的纵向缺失数据时,MMRM和MI的统计性能有所降低,可采用基于对照的PMM进行敏感性分析,但需要注意其具体假设,防止估计过于保守。  相似文献   

8.
In this paper, we develop methods to combine multiple biomarker trajectories into a composite diagnostic marker using functional data analysis (FDA) to achieve better diagnostic accuracy in monitoring disease recurrence in the setting of a prospective cohort study. In such studies, the disease status is usually verified only for patients with a positive test result in any biomarker and is missing in patients with negative test results in all biomarkers. Thus, the test result will affect disease verification, which leads to verification bias if the analysis is restricted only to the verified cases. We treat verification bias as a missing data problem. Under both missing at random (MAR) and missing not at random (MNAR) assumptions, we derive the optimal classification rules using the Neyman-Pearson lemma based on the composite diagnostic marker. We estimate thresholds adjusted for verification bias to dichotomize patients as test positive or test negative, and we evaluate the diagnostic accuracy using the verification bias corrected area under the ROC curves (AUCs). We evaluate the performance and robustness of the FDA combination approach and assess the consistency of the approach through simulation studies. In addition, we perform a sensitivity analysis of the dependency between the verification process and disease status for the approach under the MNAR assumption. We apply the proposed method on data from the Religious Orders Study and from a non-small cell lung cancer trial.  相似文献   

9.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
Many cohort studies and clinical trials are designed to compare rates of change over time in one or more disease markers in several groups. One major problem in such longitudinal studies is missing data due to patient drop-out. The bias and efficiency of six different methods to estimate rates of changes in longitudinal studies with incomplete observations were compared: generalized estimating equation estimates (GEE) proposed by Liang and Zeger (1986); unweighted average of ordinary least squares (OLSE) of individual rates of change (UWLS); weighted average of OLSE (WLS); conditional linear model estimates (CLE), a covariate type estimates proposed by Wu and Bailey (1989); random effect (RE), and joint multivariate RE (JMRE) estimates. The latter method combines a linear RE model for the underlying pattern of the marker with a log-normal survival model for informative drop-out process. The performance of these methods in the presence of missing data completely at random (MCAR), at random (MAR) and non-ignorable (NIM) were compared in simulation studies. Data for the disease marker were generated under the linear random effects model with parameter values derived from realistic examples in HIV infection. Rates of drop-out, assumed to increase over time, were allowed to be independent of marker values or to depend either only on previous marker values or on both previous and current marker values. Under MACR all six methods yielded unbiased estimates of both group mean rates and between-group difference. However, the cross-sectional view of the data in the GEE method resulted in seriously biased estimates under MAR and NIM drop-out process. The bias in the estimates ranged from 30 per cent to 50 per cent. The degree of bias in the GEE estimates increases with the severity of non-randomness and with the proportion of MAR data. Under MCAR and MAR all the other five methods performed relatively well. RE and JMRE estimates were more efficient(that is, had smaller variance) than UWLS, WLS and CL estimates. Under NIM, WLS and particularly RE estimates tended to underestimate the average rate of marker change (bias approximately 10 per cent). Under NIM, UWLS, CL and JMRE performed better in terms of bias (3-5 per cent) with the JMRE giving the most efficient estimates. Given that markers are key variables related to disease progression, missing marker data are likely to be at least MAR. Thus, the GEE method may not be appropriate for analysing such longitudinal marker data. The potential biases due to incomplete data require greater recognition in reports of longitudinal studies. Sensitivity analyses to assess the effect of drop-outs on inferences about the target parameters are important.  相似文献   

11.
The analysis of data from longitudinal studies requires special techniques, which take into account the fact that the repeated measurements within one individual are correlated. In this paper, the two most commonly used techniques to analyze longitudinal data are compared: generalized estimating equations (GEE) and random coefficient analysis. Both techniques were used to analyze a longitudinal dataset with six measurements on 147 subjects. The purpose of the example was to analyze the relationship between serum cholesterol and four predictor variables, i.e., physical fitness at baseline, body fatness (measured by sum of the thickness of four skinfolds), smoking and gender. The results showed that for a continuous outcome variable, GEE and random coefficient analysis gave comparable results, i.e., GEE-analysis with an exchangeable correlation structure and random coefficient analysis with only a random intercept were the same. There was also no difference between both techniques in the analysis of a dataset with missing data, even when the missing data was highly selective on earlier observed data. For a dichotomous outcome variable, the magnitude of the regression coefficients and standard errors was higher when calculated with random coefficient analysis then when calculated with GEE-analysis. Analysis of a dataset with missing data with a dichotomous outcome variable showed unpredictable results for both GEE and random coefficient analysis. It can be concluded that for a continuous outcome variable, GEE and random coefficient analysis are comparable. Longitudinal data-analysis with dichotomous outcome variables should, however, be interpreted with caution, especially when there are missing data.  相似文献   

12.
Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given data set, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population by incorporating an appropriately calculated offset termed the “calibrated-δ adjustment.” We describe the derivation of this offset from the population distribution of the incomplete variable and show how, in applications, it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-δ adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general missing not at random missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared with standard MI. Calibrated-δ adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departures from the MAR assumption.  相似文献   

13.
Existing methods for power analysis for longitudinal study designs are limited in that they do not adequately address random missing data patterns. Although the pattern of missing data can be assessed during data analysis, it is unknown during the design phase of a study. The random nature of the missing data pattern adds another layer of complexity in addressing missing data for power analysis. In this paper, we model the occurrence of missing data with a two-state, first-order Markov process and integrate the modelling information into the power function to account for random missing data patterns. The Markov model is easily specified to accommodate different anticipated missing data processes. We develop this approach for the two most popular longitudinal models: the generalized estimating equations (GEE) and the linear mixed-effects model under the missing completely at random (MCAR) assumption. For GEE, we also limit our consideration to the working independence correlation model. The proposed methodology is illustrated with numerous examples that are motivated by real study designs.  相似文献   

14.
When competing risks data arise, information on the actual cause of failure for some subjects might be missing. Therefore, a cause-specific proportional hazards model together with multiple imputation (MI) methods have been used to analyze such data. Modelling the cumulative incidence function is also of interest, and thus we investigate the proportional subdistribution hazards model (Fine and Gray model) together with MI methods as a modelling approach for competing risks data with missing cause of failure. Possible strategies for analyzing such data include the complete case analysis as well as an analysis where the missing causes are classified as an additional failure type. These approaches, however, may produce misleading results in clinical settings. In the present work we investigate the bias of the parameter estimates when fitting the Fine and Gray model in the above modelling approaches. We also apply the MI method and evaluate its comparative performance under various missing data scenarios. Results from simulation experiments showed that there is substantial bias in the estimates when fitting the Fine and Gray model with naive techniques for missing data, under missing at random cause of failure. Compared to those techniques the MI-based method gave estimates with much smaller biases and coverage probabilities of 95 per cent confidence intervals closer to the nominal level. All three methods were also applied on real data modelling time to AIDS or non-AIDS cause of death in HIV-1 infected individuals.  相似文献   

15.
Partial verification refers to the situation where a subset of patients is not verified by the reference (gold) standard and is excluded from the analysis. If partial verification is present, the observed (naive) measures of accuracy such as sensitivity and specificity are most likely to be biased. Recently, Harel and Zhou showed that partial verification can be considered as a missing data problem and that multiple imputation (MI) methods can be used to correct for this bias. They claim that even in simple situations where the verification is random within strata of the index test results, the so-called Begg and Greenes (B&G) correction method underestimates sensitivity and overestimates specificity as compared with the MI method. However, we were able to demonstrate that the B&G method produces similar results as MI, and that the claimed difference has been caused by a computational error. Additional research is needed to better understand which correction methods should be preferred in more complex scenarios of missing reference test outcome in diagnostic research.  相似文献   

16.
Aims Missing health-related quality of life (HRQOL) data in clinical trials can impact conclusions but the effect has not been thoroughly studied in HIV clinical trials. Despite repeated recommendations to avoid complete case (CC) analysis and last observation carried forward (LOCF), these approaches are commonly used to handle missing data. The goal of this investigation is to describe the use of different analytic methods under assumptions of missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) using HIV as an empirical example. Methods Medical Outcomes Study HIV (MOS-HIV) Health Survey data were combined from two large open-label multinational HIV clinical trials comparing treatments A and B over 48 weeks. Inclusion in the HRQOL analysis required completion of the MOS-HIV at baseline and at least one follow-up visit (weeks 8, 16, 24, 40, 48). Primary outcomes for the analysis were change from week 0 to 48 in mental health summary (MHS), physical health summary (PHS), pain and health distress scores analyzed using CC, LOCF, generalized estimating equations (GEE), direct likelihood and sensitivity analyses using joint mixed-effects model, and Markov chain Monte Carlo (MCMC) multiple imputation. Time and treatment were included in all models. Baseline and longitudinal variables (adverse event and reason for discontinuation) were only used in the imputation model. Results A total of 511 patients randomized to treatment A and 473 to treatment B completed the MOS-HIV at baseline and at least one follow-up visit. At week 48, 71% of patients on treatment A and 31% on treatment B completed the MOS-HIV survey. Examining changes within each treatment group, CC and MCMC generally produced the largest or most positive changes. The joint model was most conservative; direct likelihood and GEE produced intermediate results; LOCF showed no consistent trend. There was greater spread for within-group changes than between-group differences (within MHS scores for treatment A: −0.1 to 1.6, treatment B: 0.4 to 2.0; between groups: −0.7 to 0.4; within PHS scores for treatment A: −1.5 to 0.4, treatment B: −1.7 to −0.2; between groups: 0.1 to 1.1). The size of within-group changes and between-group differences was of similar magnitude for the pain and health distress scores. In all cases, the range of estimates was small <0.2 SD (less than 2 points for the summary scores and 5 points for the subscale scores). Conclusions Use of the recommended likelihood-based models that do not require assumptions of MCAR was very feasible. Sensitivity analyses using auxiliary information can help to investigate the potential effect that missing data have on results but require planning to ensure that relevant data are prospectively collected.  相似文献   

17.
The generalized estimating equation (GEE), a distribution‐free, or semi‐parametric, approach for modeling longitudinal data, is used in a wide range of behavioral, psychotherapy, pharmaceutical drug safety, and healthcare‐related research studies. Most popular methods for assessing model fit are based on the likelihood function for parametric models, rendering them inappropriate for distribution‐free GEE. One rare exception is a score statistic initially proposed by Tsiatis for logistic regression (1980) and later extended by Barnhart and Willamson to GEE (1998). Because GEE only provides valid inference under the missing completely at random assumption and missing values arising in most longitudinal studies do not follow such a restricted mechanism, this GEE‐based score test has very limited applications in practice. We propose extensions of this goodness‐of‐fit test to address missing data under the missing at random assumption, a more realistic model that applies to most studies in practice. We examine the performance of the proposed tests using simulated data and demonstrate the utilities of such tests with data from a real study on geriatric depression and associated medical comorbidities. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
Several approaches exist for handling missing covariates in the Cox proportional hazards model. The multiple imputation (MI) is relatively easy to implement with various software available and results in consistent estimates if the imputation model is correct. On the other hand, the fully augmented weighted estimators (FAWEs) recover a substantial proportion of the efficiency and have the doubly robust property. In this paper, we compare the FAWEs and the MI through a comprehensive simulation study. For the MI, we consider the multiple imputation by chained equation and focus on two imputation methods: Bayesian linear regression imputation and predictive mean matching. Simulation results show that the imputation methods can be rather sensitive to model misspecification and may have large bias when the censoring time depends on the missing covariates. In contrast, the FAWEs allow the censoring time to depend on the missing covariates and are remarkably robust as long as getting either the conditional expectations or the selection probability correct due to the doubly robust property. The comparison suggests that the FAWEs show the potential for being a competitive and attractive tool for tackling the analysis of survival data with missing covariates. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
Several methods for the estimation and comparison of rates of change in longitudinal studies with staggered entry and informative drop-outs have been recently proposed. For multivariate normal linear models, REML estimation is used. There are various approaches to maximizing the corresponding log-likelihood; in this paper we use a restricted iterative generalized least squares method (RIGLS) combined with a nested EM algorithm. An important statistical problem in such approaches is the estimation of the standard errors adjusted for the missing data (observed data information matrix). Louis has provided a general technique for computing the observed data information in terms of completed data quantities within the EM framework. The multiple imputation (MI) method for obtaining variances can be regarded as an alternative to this. The aim of this paper is to develop, apply and compare the Louis and a modified MI method in the setting of longitudinal studies where the source of missing data is either death or disease progression (informative) or end of the study (assumed non-informative). Longitudinal data are simultaneously modelled with the missingness process. The methods are illustrated by modelling CD4 count data from an HIV-1 clinical trial and evaluated through simulation studies. Both methods, Louis and MI, are used with Monte Carlo simulations of the missing data using the appropriate conditional distributions, the former with 100 simulations, the latter with 5 and 10. It is seen that naive SEs based on the completed data likelihood can be seriously biased. This bias was largely corrected by Louis and modified MI methods, which gave broadly similar estimates. Given the relative simplicity of the modified MI method, it may be preferable.  相似文献   

20.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号