首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Overcoming bias due to confounding and missing data is challenging when analyzing observational data. Propensity scores are commonly used to account for the first problem and multiple imputation for the latter. Unfortunately, it is not known how best to proceed when both techniques are required. We investigate whether two different approaches to combining propensity scores and multiple imputation (Across and Within) lead to differences in the accuracy or precision of exposure effect estimates. Both approaches start by imputing missing values multiple times. Propensity scores are then estimated for each resulting dataset. Using the Across approach, the mean propensity score across imputations for each subject is used in a single subsequent analysis. Alternatively, the Within approach uses propensity scores individually to obtain exposure effect estimates in each imputation, which are combined to produce an overall estimate. These approaches were compared in a series of Monte Carlo simulations and applied to data from the British Society for Rheumatology Biologics Register. Results indicated that the Within approach produced unbiased estimates with appropriate confidence intervals, whereas the Across approach produced biased results and unrealistic confidence intervals. Researchers are encouraged to implement the Within approach when conducting propensity score analyses with incomplete data.  相似文献   

2.
Background and ObjectivesAs a result of the development of sophisticated techniques, such as multiple imputation, the interest in handling missing data in longitudinal studies has increased enormously in past years. Within the field of longitudinal data analysis, there is a current debate on whether it is necessary to use multiple imputations before performing a mixed-model analysis to analyze the longitudinal data. In the current study this necessity is evaluated.Study Design and SettingThe results of mixed-model analyses with and without multiple imputation were compared with each other. Four data sets with missing values were created—one data set with missing completely at random, two data sets with missing at random, and one data set with missing not at random). In all data sets, the relationship between a continuous outcome variable and two different covariates were analyzed: a time-independent dichotomous covariate and a time-dependent continuous covariate.ResultsAlthough for all types of missing data, the results of the mixed-model analysis with or without multiple imputations were slightly different, they were not in favor of one of the two approaches. In addition, repeating the multiple imputations 100 times showed that the results of the mixed-model analysis with multiple imputation were quite unstable.ConclusionIt is not necessary to handle missing data using multiple imputations before performing a mixed-model analysis on longitudinal data.  相似文献   

3.
Although missing outcome data are an important problem in randomized trials and observational studies, methods to address this issue can be difficult to apply. Using simulated data, the authors compared 3 methods to handle missing outcome data: 1) complete case analysis; 2) single imputation; and 3) multiple imputation (all 3 with and without covariate adjustment). Simulated scenarios focused on continuous or dichotomous missing outcome data from randomized trials or observational studies. When outcomes were missing at random, single and multiple imputations yielded unbiased estimates after covariate adjustment. Estimates obtained by complete case analysis with covariate adjustment were unbiased as well, with coverage close to 95%. When outcome data were missing not at random, all methods gave biased estimates, but handling missing outcome data by means of 1 of the 3 methods reduced bias compared with a complete case analysis without covariate adjustment. Complete case analysis with covariate adjustment and multiple imputation yield similar estimates in the event of missing outcome data, as long as the same predictors of missingness are included. Hence, complete case analysis with covariate adjustment can and should be used as the analysis of choice more often. Multiple imputation, in addition, can accommodate the missing-not-at-random scenario more flexibly, making it especially suited for sensitivity analyses.  相似文献   

4.
BACKGROUND AND OBJECTIVES: To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. METHODS: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. RESULTS: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. CONCLUSION: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.  相似文献   

5.
In 1999, dual-energy x-ray absorptiometry (DXA) scans were added to the National Health and Nutrition Examination Survey (NHANES) to provide information on soft tissue composition and bone mineral content. However, in 1999-2004, DXA data were missing in whole or in part for about 21 per cent of the NHANES participants eligible for the DXA examination; and the missingness is associated with important characteristics such as body mass index and age. To handle this missing-data problem, multiple imputation of the missing DXA data was performed. Several features made the project interesting and challenging statistically, including the relationship between missingness on the DXA measures and the values of other variables; the highly multivariate nature of the variables being imputed; the need to transform the DXA variables during the imputation process; the desire to use a large number of non-DXA predictors, many of which had small amounts of missing data themselves, in the imputation models; the use of lower bounds in the imputation procedure; and relationships between the DXA variables and other variables, which helped both in creating and evaluating the imputations. This paper describes the imputation models, methods, and evaluations for this publicly available data resource and demonstrates properties of the imputations via examples of analyses of the data. The analyses suggest that imputation helps to correct biases that occur in estimates based on the data without imputation, and that it helps to increase the precision of estimates as well. Moreover, multiple imputation usually yields larger estimated standard errors than those obtained with single imputation.  相似文献   

6.
In medical diagnostic studies, verification of the true disease status might be partially missing based on results of diagnostic tests and other characteristics of subjects. Because estimates of area under the ROC curve (AUC) based on partially validated subjects are usually biased, it is usually necessary to estimate AUC from a bias-corrected ROC curve. In this article, various direct estimation methods of the AUC based on hybrid imputation [full imputations and mean score imputation (MSI)], inverse probability weighting, and the semiparametric efficient (SPE) approach are proposed and compared in the presence of verification bias when the test result is continuous under the assumption that the true disease status, if missing, is missing at random. Simulation results show that the proposed estimators are accurate for the biased sampling if the disease and verification models are correctly specified. The SPE and MSI based estimators perform well even under the misspecified disease/verification models. Numerical studies are performed to compare the finite sample performance of the proposed approaches with existing methods. A real dataset of neonatal hearing screening study is analyzed.  相似文献   

7.
BackgroundStatistical analysis of a data set with missing data is a frequent problem to deal with in epidemiology. Methods are available to manage incomplete observations, avoiding biased estimates and improving their precision, compared to more traditional methods, such as the analysis of the sub-sample of complete observations.MethodsOne of these approaches is multiple imputation, which consists in imputing successively several values for each missing data item. Several completed data sets having the same distribution characteristics as the observed data (variability and correlations) are thus generated. Standard analyses are done separately on each completed dataset then combined to obtain a global result. In this paper, we discuss the various assumptions made on the origin of missing data (at random or not), and we present in a pragmatic way the process of multiple imputation. A recent method, Multiple Imputation by Chained Equations (MICE), based on a Monte-Carlo Markov Chain algorithm under missing at random data (MAR) hypothesis, is described. An illustrative example of the MICE method is detailed for the analysis of the relation between a dichotomous variable and two covariates presenting MAR data with no particular structure, through multivariate logistic regression.ResultsCompared with the original dataset without missing data, the results show a substantial improvement of the regression coefficient estimates with the MICE method, relatively to those obtained on the dataset with complete observations.ConclusionThis method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software (Splus, Stata). It can be used for multiple imputation of missing data of several variables with no particular structure.  相似文献   

8.
Multiple imputation of baseline data in the cardiovascular health study   总被引:3,自引:0,他引:3  
Most epidemiologic studies will encounter missing covariate data. Software packages typically used for analyzing data delete any cases with a missing covariate to perform a complete case analysis. The deletion of cases complicates variable selection when different variables are missing on different cases, reduces power, and creates the potential for bias in the resulting estimates. Recently, software has become available for producing multiple imputations of missing data that account for the between-imputation variability. The implementation of the software to impute missing baseline data in the setting of the Cardiovascular Health Study, a large, observational study, is described. Results of exploratory analyses using the imputed data were largely consistent with results using only complete cases, even in a situation where one third of the cases were excluded from the complete case analysis. There were few differences in the exploratory results across three imputations, and the combined results from the multiple imputations were very similar to results from a single imputation. An increase in power was evident and variable selection simplified when using the imputed data sets.  相似文献   

9.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

10.
What is the influence of various methods of handling missing data (complete case analyses, single imputation within and over trials, and multiple imputations within and over trials) on the subgroup effects of individual patient data meta-analyses? An empirical data set was used to compare these five methods regarding the subgroup results. Logistic regression analyses were used to determine interaction effects (regression coefficients, standard errors, and p values) between subgrouping variables and treatment. Stratified analyses were performed to determine the effects in subgroups (rate ratios, rate differences, and their 95% confidence intervals). Imputation over trials resulted in different regression coefficients and standard errors of the interaction term as compared with imputation within trials and complete case analyses. Significant interaction effects were found for complete case analyses and imputation within trials, whereas imputation over trials often showed no significant interaction effect. Imputation of missing data over trials might lead to bias, because association of covariates might differ across the included studies. Therefore, despite the gain in statistical power, imputation over trials is not recommended. In the authors' empirical example, imputation within trials appears to be the most appropriate approach of handling missing data in individual patient data meta-analyses.  相似文献   

11.
ABSTRACT: BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate's effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.  相似文献   

12.
The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables, which formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post‐imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
OBJECTIVE: To illustrate methods for handling incomplete data in health research. METHODS: Two strategies for handling missing data are presented: complete-case analysis and imputations. The imputations used were mean imputations, regression imputations, and multiple imputations. These strategies are illustrated in the context of logistic regression through an example using data from the "Second Cuban national survey on risk factors and non communicable disease", carried out in 2001. RESULTS: The results obtained via mean and regression imputation were similar. The odds ratios were overestimated by 10%. The results of complete-case analysis showed the greatest difference from the reference odds ratios, with a variation of between 2 and 65%. The three methods distorted the relationship between age and hypertension. Multiple imputations produced estimates closest to those of the reference estimates with a variation of less than 16%. This was the only procedure preserving the relationship between age and hypertension. CONCLUSIONS: Selecting methods for handling missing data is difficult, since the same procedure can give precise estimations in certain circumstances and not in others. Complete-case analysis should be used with caution due to the substantial loss of information it produces. Mean and regression imputations produce unreliable estimates under missing at random (MAR) mechanisms.  相似文献   

14.
The purpose of this paper was to illustrate the influence of missing data on the results of longitudinal statistical analyses [i.e., MANOVA for repeated measurements and Generalised Estimating Equations (GEE)] and to illustrate the influence of using different imputation methods to replace missing data. Besides a complete dataset, four incomplete datasets were considered: two datasets with 10% missing data and two datasets with 25% missing data. In both situations missingness was considered independent and dependent on observed data. Imputation methods were divided into cross-sectional methods (i.e., mean of series, hot deck, and cross-sectional regression) and longitudinal methods (i.e., last value carried forward, longitudinal interpolation, and longitudinal regression). Besides these, also the multiple imputation method was applied and discussed. The analyses were performed on a particular (observational) longitudinal dataset, with particular missing data patterns and imputation methods. The results of this illustration shows that when MANOVA for repeated measurements is used, imputation methods are highly recommendable (because MANOVA as implemented in the software used, uses listwise deletion of cases with a missing value). Applying GEE analysis, imputation methods were not necessary. When imputation methods were used, longitudinal imputation methods were often preferable above cross-sectional imputation methods, in a way that the point estimates and standard errors were closer to the estimates derived from the complete dataset. Furthermore, this study showed that the theoretically more valid multiple imputation method did not lead to different point estimates than the more simple (longitudinal) imputation methods. However, the estimated standard errors appeared to be theoretically more adequate, because they reflect the uncertainty in estimation caused by missing values.  相似文献   

15.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

16.
A multiple imputation strategy for incomplete longitudinal data   总被引:3,自引:0,他引:3  
Longitudinal studies are commonly used to study processes of change. Because data are collected over time, missing data are pervasive in longitudinal studies, and complete ascertainment of all variables is rare. In this paper a new imputation strategy for completing longitudinal data sets is proposed. The proposed methodology makes use of shrinkage estimators for pooling information across geographic entities, and of model averaging for pooling predictions across different statistical models. Bayes factors are used to compute weights (probabilities) for a set of models considered to be reasonable for at least some of the units for which imputations must be produced, imputations are produced by draws from the predictive distributions of the missing data, and multiple imputations are used to better reflect selected sources of uncertainty in the imputation process. The imputation strategy is developed within the context of an application to completing incomplete longitudinal variables in the so-called Area Resource File. The proposed procedure is compared with several other imputation procedures in terms of inferences derived with the imputations, and the proposed methodology is demonstrated to provide valid estimates of model parameters when the completed data are analysed. Extensions to other missing data problems in longitudinal studies are straightforward so long as the missing data mechanism can be assumed to be ignorable.  相似文献   

17.
When using 'intent-to-treat' approaches to compare outcomes between groups in clinical trials, analysts face a decision regarding how to account for missing observations. Most model-based approaches can be summarized as a process whereby the analyst makes assumptions about the distribution of the missing data in an attempt to obtain unbiased estimates that are based on functions of the observed data. Although pointed out by Rubin as often leading to biased estimates of variances, an alternative approach that continues to appear in the applied literature is to use fixed-value imputation of means for missing observations. The purpose of this paper is to provide illustrations of how several fixed-value mean imputation schemes can be formulated in terms of general linear models that characterize the means of distributions of missing observations in terms of the means of the distributions of observed data. We show that several fixed-value imputation strategies will result in estimated intervention effects that correspond to maximum likelihood estimates obtained under analogous assumptions. If the missing data process has been correctly characterized, hypothesis tests based on variances estimated using maximum likelihood techniques asymptotically have the correct size. In contrast, hypothesis tests performed using the uncorrected variance, obtained by applying standard complete data formula to singly imputed data, can provide either conservative or anticonservative results. Surprisingly, under several non-ignorable non-response scenarios, maximum likelihood based analyses can yield equivalent hypothesis tests to those obtained when analysing only the observed data.  相似文献   

18.
Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of significant associations for SNPs across the whole MAF spectrum. Because the 1KG Project is still under way, we expect that later versions will provide better imputation performance.  相似文献   

19.
Attrition threatens the internal validity of cohort studies. Epidemiologists use various imputation and weighting methods to limit bias due to attrition. However, the ability of these methods to correct for attrition bias has not been tested. We simulated a cohort of 300 subjects using 500 computer replications to determine whether regression imputation, individual weighting, or multiple imputation is useful to reduce attrition bias. We compared these results to a complete subject analysis. Our logistic regression model included a binary exposure and two confounders. We generated 10, 25, and 40% attrition through three missing data mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), and used four covariance matrices to vary attrition. We compared true and estimated mean odds ratios (ORs), standard deviations (SDs), and coverage. With data MCAR and MAR for all attrition rates, the complete subject analysis produced results at least as valid as those from the imputation and weighting methods. With data MNAR, no method provided unbiased estimates of the OR at attrition rates of 25 or 40%. When observations are not MAR or MCAR, imputation and weighting methods may not effectively reduce attrition bias.  相似文献   

20.
OBJECTIVE: The purpose of this study was to assess an alternative statistical approach-multiple imputation-to risk factor redistribution in the national human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) surveillance system as a way to adjust for missing risk factor information. METHODS: We used an approximate model incorporating random variation to impute values for missing risk factors for HIV and AIDS cases diagnosed from 2000 to 2004. The process was repeated M times to generate M datasets. We combined results from the datasets to compute an overall multiple imputation estimate and standard error (SE), and then compared results from multiple imputation and from risk factor redistribution. Variables in the imputation models were age at diagnosis, race/ethnicity, type of facility where diagnosis was made, region of residence, national origin, CD-4 T-lymphocyte cell count within six months of diagnosis, and reporting year. RESULTS: In HIV data, male-to-male sexual contact accounted for 67.3% of cases by risk factor redistribution and 70.4% (SE = 0.45) by multiple imputation. Also among males, injection drug use (IDU) accounted for 11.6% and 10.8% (SE = 0.34), and high-risk heterosexual contact for 15.1% and 13.0% (SE = 0.34) by risk factor redistribution and multiple imputation, respectively. Among females, IDU accounted for 18.2% and 17.9% (SE = 0.61), and high-risk heterosexual contact for 80.8% and 80.9% (SE = 0.63) by risk factor redistribution and multiple imputation, respectively. CONCLUSIONS: Because multiple imputation produces less biased subgroup estimates and offers objectivity and a semiautomated approach, we suggest consideration of its use in adjusting for missing risk factor information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号