基于K近邻的过抽样算法在不平衡的医学资料中的应用 Application of Over-sampling Algorithm Based on K Nearest Neighbors in Imbalanced Medical Datasets Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于K近邻的过抽样算法在不平衡的医学资料中的应用

引用本文：	周舒冬,张磊,李丽霞.基于K近邻的过抽样算法在不平衡的医学资料中的应用[J].中国卫生统计,2008,25(6).

作者姓名：	周舒冬张磊李丽霞

作者单位：	1. 广东药学院公共卫生学院卫生统计学教研窒,510310 2. 中山大学数学与计算科学学院

基金项目：	广东省医学科研基金

摘要：	目的介绍一种基于K近邻的过抽样算法在不平衡的医学数据集分类中的应用。方法首先利用K近邻法删除在分类中容易与少数类混淆的多数类样本;再对新生成的训练集利用SMOTE算法进行少数类样本的扩充,以有效提高少数类的分类性能。结果利用社区人群的慢性阻塞性肺疾病资料验证,基于K近邻的过抽样算法的分类性能比合成少数类过抽样算法及欠抽样方法要强。结论当医学资料出现不平衡特征时,传统分类器的分类效果不佳,基于K近邻的过抽样算法能够获得良好的分类性能,在医学模式识别领域中有较好的应用前景。
关键词：	K近邻过抽样不平衡医学资料模式识别
Application of Over-sampling Algorithm Based on K Nearest Neighbors in Imbalanced Medical Datasets Learning

Zhou Shudong,Zhang Lei,Li Lixia.Application of Over-sampling Algorithm Based on K Nearest Neighbors in Imbalanced Medical Datasets Learning[J].Chinese Journal of Health Statistics,2008,25(6).

Authors:	Zhou Shudong Zhang Lei Li Lixia

Abstract:	Objective To introduce the application of over-sampling algorithm based on k nearest neighbors in imbalanced medical datasets classified.Methods Firstly,KNN was made on the dataset in order to delete the majority class samples those easily mixed with the minority class samples.Then the SMOTE technique added more examples to the minority class in the new train dataset to improve the classification performance of the minority class.Results Using the COPD data sets from the community,the new algorithm was compared with SMOTE technique and under-sampling method.The experimental results show that the new algorithm performs better than the other methods.Conclusion The traditional classifiers display poor performance in the imbalanced medical datasets.The over-sampling algorithm based on KNN performs a better performance and will make a better feature of its application in medical researches.

Keywords:	K nearest neighbors Over-sampling Imbalanced Medical datasets Machine learning
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏