首页 | 本学科首页   官方微博 | 高级检索  
检索        

基于K近邻的过抽样算法在不平衡的医学资料中的应用
引用本文:周舒冬,张磊,李丽霞.基于K近邻的过抽样算法在不平衡的医学资料中的应用[J].中国卫生统计,2008,25(6).
作者姓名:周舒冬  张磊  李丽霞
作者单位:1. 广东药学院公共卫生学院卫生统计学教研窒,510310
2. 中山大学数学与计算科学学院
基金项目:广东省医学科研基金  
摘    要:目的介绍一种基于K近邻的过抽样算法在不平衡的医学数据集分类中的应用。方法首先利用K近邻法删除在分类中容易与少数类混淆的多数类样本;再对新生成的训练集利用SMOTE算法进行少数类样本的扩充,以有效提高少数类的分类性能。结果利用社区人群的慢性阻塞性肺疾病资料验证,基于K近邻的过抽样算法的分类性能比合成少数类过抽样算法及欠抽样方法要强。结论当医学资料出现不平衡特征时,传统分类器的分类效果不佳,基于K近邻的过抽样算法能够获得良好的分类性能,在医学模式识别领域中有较好的应用前景。

关 键 词:K近邻  过抽样  不平衡  医学资料  模式识别

Application of Over-sampling Algorithm Based on K Nearest Neighbors in Imbalanced Medical Datasets Learning
Zhou Shudong,Zhang Lei,Li Lixia.Application of Over-sampling Algorithm Based on K Nearest Neighbors in Imbalanced Medical Datasets Learning[J].Chinese Journal of Health Statistics,2008,25(6).
Authors:Zhou Shudong  Zhang Lei  Li Lixia
Abstract:Objective To introduce the application of over-sampling algorithm based on k nearest neighbors in imbalanced medical datasets classified.Methods Firstly,KNN was made on the dataset in order to delete the majority class samples those easily mixed with the minority class samples.Then the SMOTE technique added more examples to the minority class in the new train dataset to improve the classification performance of the minority class.Results Using the COPD data sets from the community,the new algorithm was compared with SMOTE technique and under-sampling method.The experimental results show that the new algorithm performs better than the other methods.Conclusion The traditional classifiers display poor performance in the imbalanced medical datasets.The over-sampling algorithm based on KNN performs a better performance and will make a better feature of its application in medical researches.
Keywords:K nearest neighbors  Over-sampling  Imbalanced  Medical datasets  Machine learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号