首页 | 本学科首页   官方微博 | 高级检索  
检索        

基于SMOTE算法与机器学习的老年人健康素养预测研究
引用本文:王可,赵华硕,张虹,黄水平,金英良,曾平.基于SMOTE算法与机器学习的老年人健康素养预测研究[J].中国校医,2019,33(9):641.
作者姓名:王可  赵华硕  张虹  黄水平  金英良  曾平
作者单位:1.徐州医科大学公共卫生学院流行病与卫生统计学教研室,江苏 徐州 221004;2.徐州市儿童医院重症医学科
基金项目:江苏高校哲学社会科学研究项目(2015SJD455)
摘    要:目的 探讨合成少数类过采样技术(SMOTE)结合机器学习模型在老年人是否具备健康素养预测评估中的应用。方法 利用单因素筛选从资料中筛选出与是否具备健康素养有关联的变量;以筛选出的变量作为输入变量,以是否具备健康素养为结局变量,分别在经SMOTE算法处理前后的数据集中建立logistic回归模型、随机森林和SVM模型,通过受试者工作特征曲线(ROC)来评价模型性能。结果 Logistic回归、随机森林和SVM在SMOTE算法处理前的测试集中的准确率分别为0.833、0.600和0.636,3种模型的ROC曲线下面积(AUC)分别为0.723、0.815和0.728;在SMOTE算法处理后的测试集中的准确率分别为0.936、0.908和0.890,3种模型的AUC分别为0.896、0.944和0.897。结论 随机森林模型在老年人是否具备健康素养的预后评估中具有较高的应用价值。

关 键 词:老年人  健康素养  模型  统计学  
收稿时间:2019-07-18

Prediction of health literacy of the elderly based on SMOTE algorithm and machine learning
WANG Ke,ZHAO Hua-shuo,ZHANG Hong,HUANG Shui-ping,JIN Ying-liang,ZENG Ping.Prediction of health literacy of the elderly based on SMOTE algorithm and machine learning[J].Chinese Journal of School Doctor,2019,33(9):641.
Authors:WANG Ke  ZHAO Hua-shuo  ZHANG Hong  HUANG Shui-ping  JIN Ying-liang  ZENG Ping
Institution:Department of Epidemiology and Health Statistics, School of Public Health, Xuzhou Medical University, Xuzhou 221004, Jiangsu, China
Abstract:Objective To explore the application of synthetic minority oversampling technique (SMOTE) algorithm combined with machine learning model in the evaluation of whether the elderly have health literacy prediction. Methods Single factor screening was used to screen out the variables associated with health literacy; the selected variables were used as input variables, and whether there was health literacy was as the outcome variable. The logistic regression was established in the datum sets before and after SMOTE algorithm processing. The models, random forests, and support vector machines (SVM) models were used to evaluate the model performance by receiver operating characteristic (ROC) curve. Results The accuracies of the logistic regression, random forest and SVM in the test set before SMOTE algorithm processing were 0.833, 0.600 and 0.636, respectively. The areas under the ROC curve (AUC) of the three models were 0.723, 0.815 and 0.728 respectively. After SMOTE algorithm processing, the accuracies of the test set were 0.936, 0.908, and 0.890, respectively, and the AUC of the three models were 0.896, 0.944, and 0.897, respectively. Conclusion The random forest model has a high application value in the prognosis evaluation of whether the elderly have health literacy.
Keywords:elderly  health literacy  model  statistics  
点击此处可从《中国校医》浏览原始摘要信息
点击此处可从《中国校医》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号