首页 | 本学科首页   官方微博 | 高级检索  
检索        

不同缺失值处理方法对随机缺失数据处理效果的比较
引用本文:花琳琳,施念,杨永利,赵天仪,施学忠.不同缺失值处理方法对随机缺失数据处理效果的比较[J].郑州大学学报(医学版),2012,47(3):315-318.
作者姓名:花琳琳  施念  杨永利  赵天仪  施学忠
作者单位:花琳琳 (郑州大学公共卫生学院卫生统计学教研室,郑州,450001) ; 施念 (郑州大学基础医学院,郑州,450001) ; 杨永利 (郑州大学公共卫生学院卫生统计学教研室,郑州,450001) ; 赵天仪 (上海交通大学医学院,上海,200025) ; 施学忠 (郑州大学公共卫生学院卫生统计学教研室,郑州,450001) ;
基金项目:"十·五"国家科技攻关计划基金资助项目,2004BA719A13-6
摘    要:目的:比较不同的缺失值处理方法处理随机缺失数据的效果。方法:以HIV/AIDS血液样本血红蛋白、白细胞和血尿素氮检测数据为基础,利用SAS9.1,分别模拟完整数据集和不同缺失率的数据集,从精确度、准确度和分布三方面比较不同方法对缺失数据集的处理效果。结果:任意缺失比例下血红蛋白和白细胞数据经不同的方法处理后与完整数据集比较差异无统计学意义。不同缺失比例下,多重填补(MI)法的精确度最高。缺失率10%~20%时,MI法填充后的准确度最高。缺失率30%时,成组删除法处理后的准确度最高。缺失40%以上时,准确度填充效果不稳定。不同缺失比例下,回归法、成组删除法和MI填充2次后的数据的分布特征与完整数据集一致。结论:数据缺失10%~20%时,MI法填充效果最好;缺失30%时,成组删除法处理效果最好;缺失40%以上时,所有方法填充效果均不佳。

关 键 词:连续变量  缺失值  随机缺失

Comparison of different methods in dealing with missing values of missing at random
HUA Linlin,SHI Nian,YANG Yongli,ZHAO Tianyi,SHI Xuezhong.Comparison of different methods in dealing with missing values of missing at random[J].Journal of Zhengzhou University: Med Sci,2012,47(3):315-318.
Authors:HUA Linlin  SHI Nian  YANG Yongli  ZHAO Tianyi  SHI Xuezhong
Institution:1) 1)Department of Health Statistics,College of Public Health,Zhengzhou University,Zhengzhou 450001 2)College of Basic Medical Sciences,Zhengzhou University,Zhengzhou 450001 3)School of Medicine,Shanghai Jiaotong University,Shanghai 200025
Abstract:Aim:To compare the results of different methods in dealing with missing values of missing at random.Methods:SAS 9.1 was used to simulate complete data and missing data with different missing rate from HIV/AIDS blood specimen data.The results of different methods were compared from distribution characteristic,accuracy and precision.Results:The variables of hemoglobin and white blood cells had no significant difference among the results of different methods.The multiple-imputation(MI) method had best precision.When missing rate was between 10% and 20%,MI method had better accuracy than the others.When missing rate was about 30%,deleting in groups had better accuracy than the others.When missing rate was above 40%,any methods had bad accuracy.Compared with other methods,regression,deleting in groups and MI had better distribution characteristic.Conclusion:When missing rate is between 10% and 20%,MI is more suitable than others.When missing rate is about 30%,deleting in groups is more appropriate.When missing rate is above 40%,the effect of all methods is poor.
Keywords:continuous variable  missing value  missing at random
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号