共查询到19条相似文献,搜索用时 78 毫秒
1.
目的针对混合型缺失数据,使用几种填补方法在缺失填补中的应用并评价填补效果。方法结合实际数据,模拟出不同缺失比例(10%、20%、30%、50%),采用MissForest、因子分析(FAMD)、K-最近邻填补法(KNN)和基于链式方程多重插补(MICE)四种方法进行填补;采用错分类比例(PFC)、正则化均方根误差(NRMSE)和回归系数估计值比较填补效果。结果 FAMD与MissForest相比,对分类变量填补表现优越。缺失比例是10%时,FAMD与MissForest表现优于KNN和MICE;缺失比例是20%时FAMD明显优于其它三种方法,但是MissForest表现亦可;缺失比例是30%时,四种模型表现明显下降,处理效果均不太理想;缺失比例是50%时,虽然FAMD仍有两个变量符合优良标准,但对某些变量估计误差较大,其它三种方法填补均失效。结论 FAMD填补方法总体表现较好,面对混合型缺失数据时可以考虑优先选用。 相似文献
2.
3.
调查研究中数据缺失的机制及处理方法 总被引:1,自引:0,他引:1
缺失数据(Missing Data)在调查研究中(特别是对大规模人群的调查)是一非常普遍的问题,并在一定程度上危害研究结果的有效性。例如调查对象拒绝或忘记回答某个或某些调查问题,文件的遗失,数据不准确的记录等,都会造成数据的缺失或丢失。这些都可以在调查项目和调查对象水平上发生,例如结局变量的缺失是一个典型的调查项目缺失(或无应答);某些最初确定的调查对象,由于某些原因而不能参与调查,这就会在调查对象水平上发生缺失。当花费很大代价收集的资料存在有许多数据缺失时,我们不能因为有数据缺失,而放弃对资料的分析与利用,也不能因为调查中可能存在数据的缺失,而放弃对资料的收集或等到发展十分完美的资料收集方法时才去调查研究,这样我们必须决定采取什么样的方法去处理分析这些带有缺失的数据的资料。下面着重阐述调查研究中缺失数据的分类及处理方法。 相似文献
4.
目的 数据缺失是队列研究中几乎无法避免的问题。本文旨在通过模拟研究,比较当前常见的8种缺失数据处理方法在纵向缺失数据中的填补效果,为纵向缺失数据的处理提供有价值的参考。方法 模拟研究基于R语言编程实现,通过Monte Carlo方法产生纵向缺失数据,通过比较不同填补方法的平均绝对偏差、平均相对偏差和回归分析的Ⅰ类错误,评价不同填补方法对于纵向缺失数据的填补效果及对后续多因素分析的影响。结果 均值填补、k近邻填补(KNN)、回归填补和随机森林的填补效果接近,且表现稳定;多重插补和热卡填充次于以上填补方法;K均值聚类和EM算法填补效果最差,表现也最不稳定。均值填补、EM算法、随机森林、KNN和回归填补可较好地控制Ⅰ类错误,多重插补、热卡填充和K均值聚类不能有效控制Ⅰ类错误。结论 对于纵向缺失数据,在随机缺失机制下,均值填补、KNN、回归填补和随机森林均可作为较好的填补方法,当缺失比例不太大时,多重插补和热卡填充也表现较好,不推荐K均值聚类和EM算法。 相似文献
5.
数据缺失及其填补方法综述 总被引:6,自引:0,他引:6
在社会调查资料中,最为常见的问题就是数据缺失。造成数据缺失的原因有:失访、无响应或是回答问题不合格等等。统计学上,将含有缺失数据的记录称为不完全观测。缺失数据或不完全观测对调查研究的影响是很大的。所以在统计学中,为了能够更加充分地利用已经搜集到的数据,国内外很 相似文献
6.
目的 针对纵向缺失数据,比较几种适用的填补方法并从中选择最佳方法用于阿尔茨海默病随访资料的数据缺失填补。方法 针对随机缺失机制且缺失变量为连续变量的纵向缺失资料,模拟缺失比例分别为10%、20%、30%、40%和50%的随机数据集,结合末次观察值结转(Last Observation Carried Forward, LOCF )填补方法、马尔可夫链蒙特卡罗填补法(Markov Chain Monte Carlo, MCMC)、全条件定义法(Fully Conditional Specification, FCS)进行填补,采用无偏性和有效性评价指标,比较填补效果,选取最理想的填补方法,并将该方法应用于阿尔茨海默病随访研究中收缩压和蒙特利尔认知评估量表(Montreal Cognitive Assessment, MoCA)得分的填补。结果 (1)纵向缺失资料中若不考虑时间变量,在处理几个连续性的缺失变量时,MCMC法在各缺失率下填补均优势明显,LOCF填补法在缺失率较低时具有一定的效果,且方法简单,而FCS法的填补结果均不太好。当数据缺失比较严重,缺失率高于40%时,各种填补方法的填补结果均不佳。(2)将MCMC法用于填补阿尔茨海默病的随访缺失数据,当填补次数为3时,收缩压和MoCA得分两指标的填补效果最佳。结论 为了得到最理想的结果,在处理缺失数据时填补方法和适当的填补次数都需要考虑。 相似文献
7.
8.
目的用模拟研究的方法,对含周期性的时间序列数据中随机型缺失数据进行填补,比较基于周期信息的时间序列缺失值填补法(简称周期性填补法)和spline插值填补法对缺失数据的填补效果。方法利用SAS模拟产生平稳、有周期性的时间序列数据并构造随机型缺失。分别比较相同序列长度不同缺失比例和相同缺失比例不同序列长度下,两种方法的缺失值填补效果。采用NRMSE和RMSE量化填补的误差。结果相同序列长度下,随着缺失比例的增加,两种填补方法的填补误差均增加,除缺失比例为30%的RMSE在两种方法间的差异无统计学意义外,周期性填补法的NRMSE和RMSE均小于spline填补法(P<0.05)。相同缺失比例下,序列长度较短时,两种填补方法的差异无统计学意义;序列长度较长时,周期性填补法的填补效果优于spline填补法。结论总体上,周期性填补法对含有确切周期性的时间序列中缺失数据的填补效果较好。 相似文献
9.
目的用模拟研究的方法,对含周期性的时间序列数据中的连续型缺失数据进行填补,比较基于周期信息的时间序列缺失值填补法(简称周期性填补法)和spline插值填补法对连续型缺失数据的填补效果。方法分别应用模拟时间序列数据和实际时间序列数据模拟连续型缺失,比较两种方法在不同连续缺失个数下的缺失值填补效果。采用NRMSE和RMSE量化填补的误差。结果除连续型缺失长度为10和平,随着连续缺失个数的增加,周期性填补法的填补误均小于spline插值填补法。周期性填补方法的填补误差在5~30的连续缺失范围内无明显波动,始终保持在一个较低的水平;而spline填补值的误差随着缺失个数的增加明显增高。结论对于含有确切周期性的时间序列,周期性填补方法对连续型缺失数据的填补效果相对于spline填补更好,填补误差稳定,并且不随连续缺失长度的增加而有较大的变化。 相似文献
10.
11.
Burns RA Butterworth P Kiely KM Bielak AA Luszcz MA Mitchell P Christensen H Von Sanden C Anstey KJ 《Journal of clinical epidemiology》2011,64(7):787-793
Objective
The Mini-Mental State Examination (MMSE) is used to estimate current cognitive status and as a screen for possible dementia. Missing item-level data are commonly reported. Attention to missing data is particularly important. However, there are concerns that common procedures for dealing with missing data, for example, listwise deletion and mean item substitution, are inadequate.Study Design and Setting
We used multiple imputation (MI) to estimate missing MMSE data in 17,303 participants who were drawn from the Dynamic Analyses to Optimize Aging project, a harmonization project of nine Australian longitudinal studies of aging.Results
Our results indicated differences in mean MMSE scores between those participants with and without missing data, a pattern consistent over age and gender levels. MI inflated MMSE scores, but differences between those imputed and those without missing data still existed. A simulation model supported the efficacy of MI to estimate missing item level, although serious decrements in estimation occurred when 50% or more of item-level data were missing, particularly for the oldest participants.Conclusions
Our adaptation of MI to obtain a probable estimate for missing MMSE item level data provides a suitable method when the proportion of missing item-level data is not excessive. 相似文献12.
Suzie
Cro Tim P. Morris Michael G. Kenward James R. Carpenter 《Statistics in medicine》2020,39(21):2815-2842
Missing data due to loss to follow-up or intercurrent events are unintended, but unfortunately inevitable in clinical trials. Since the true values of missing data are never known, it is necessary to assess the impact of untestable and unavoidable assumptions about any unobserved data in sensitivity analysis. This tutorial provides an overview of controlled multiple imputation (MI) techniques and a practical guide to their use for sensitivity analysis of trials with missing continuous outcome data. These include δ- and reference-based MI procedures. In δ-based imputation, an offset term, δ, is typically added to the expected value of the missing data to assess the impact of unobserved participants having a worse or better response than those observed. Reference-based imputation draws imputed values with some reference to observed data in other groups of the trial, typically in other treatment arms. We illustrate the accessibility of these methods using data from a pediatric eczema trial and a chronic headache trial and provide Stata code to facilitate adoption. We discuss issues surrounding the choice of δ in δ-based sensitivity analysis. We also review the debate on variance estimation within reference-based analysis and justify the use of Rubin's variance estimator in this setting, since as we further elaborate on within, it provides information anchored inference. 相似文献
13.
Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. 相似文献
14.
Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. 相似文献
15.
Analysis of incomplete quality of life data in advanced stage cancer: A practical application of multiple imputation 总被引:2,自引:0,他引:2
This paper presents a practical approach to analyzing incomplete quality of life (QOL) data that contains non-ignorable dropouts in patients with advanced non-small-cell lung cancer (NSCLC). QOL scores for the physical domain at baseline and at the end of the first and second courses of chemotherapy were compared between two treatment groups in a phase III trial. One hundred and 103 eligible patients were randomized to receive cisplatin and irinotecan (CPT-P) or cisplatin and vindesine, respectively; of those two groups, 83 and 85, respectively, completed a QOL questionnaire at least at baseline. A multiple imputation incorporating auxiliary QOL variables was implemented as one of alternatives of sensitivity analyses; these were complete case, available case, and pattern mixture analyses. Although larger sensitivity to missing data was found for CPT-P treatment, none of the alternative analyses demonstrated a significant difference in estimated slopes over time between the groups. This study presents an analytical approach for dealing with the complex problem of missing QOL data. It must be noted, however, that the validity of the multiple imputation method we present is not certain unless we can specify sufficiently informative auxiliary variables to ensure the conversion of non-ignorable missingness to ignorable. 相似文献
16.
目的 探讨不同缺失数据填补法对MSM人群HIV感染者(MSM感染者)病毒载量(VL)缺失数据的填补效果。方法 以2013年中国16个大城市MSM感染者VL抽样检测数据为基础,采用SPSS 17.0软件,模拟完整数据集和5种不同类型的缺失数据集,采用最大期望值法(EM)、回归法、均值填补法、删除法、马尔科夫链蒙特卡罗法(MCMC)对5种VL缺失数据填补处理,从数据分布、准确度、精确度3个方面比较填补效果。结果 VL数据呈偏态非连续分布,难以进行有效正态分布转化;不同填补方法对完全随机缺失数据填补效果均较好;对于其他类型缺失数据,回归法、MCMC较好保留完整数据主要分布特征;EM、回归法、均值填补法、删除法普遍低估数据均值,MCMC多高估数据均值。结论 MCMC可作为首选的VL数据对数转换后缺失数据填补方法。填补数据可作为调查人群VL均值水平估算的参考依据。 相似文献
17.
《Journal of clinical epidemiology》2014,67(3):335-342
ObjectivesRegardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument.Study Design and SettingReal-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters.ResultsMean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score.ConclusionWe recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data. 相似文献
18.
目的 简要介绍R 环境下MICE填补方法(Multivariate imputation by chained equations)的填补估算应用并评价其填补效果.方法以实际数据阐述填补估算流程,比较MICE与常见的缺失数据处理方法(删除法、均(众)数法、回归法)填补估算效果的差异.结果当数据缺失率为10%时,MICE与常见的缺失数据处理方法估算结果无明显差异,各填补方法的3种变量的回归系数估计的相对误差在10%左右.随着缺失率的增加(20%,40%),各方法回归系数估计的相对误差都增加,但MICE 3种变量的回归系数的相对误差稳定在10%~20%左右,MICE表现优于其他方法而且结果稳定,回归法次之,删除法和均(众)数法较差.当缺失率达50%时,3种类型的变量估算的误差已经较大,所有方法填补估算效果欠佳.结论 MICE较其他多重填补软件操作简便,与常见的缺失数据处理方法相比,可充分地利用缺失记录的信息,能较准确地反应调查的真实情况,值得在实际工作中推广应用. 相似文献
19.
Matthieu Resche‐Rigon Ian R. White JonathanW. Bartlett Sanne A.E. Peters Simon G. Thompson 《Statistics in medicine》2013,32(28):4890-4905
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献