首页 | 本学科首页   官方微博 | 高级检索  
检索        

基于随机森林回归模型的登革热风险评估究
引用本文:黄宇琳,赵永谦,曹峥,刘涛,邓爱萍,肖建鹏,张兵,祝光湖,彭志强,马文军.基于随机森林回归模型的登革热风险评估究[J].华南预防医学,2019,45(1):26-31.
作者姓名:黄宇琳  赵永谦  曹峥  刘涛  邓爱萍  肖建鹏  张兵  祝光湖  彭志强  马文军
作者单位:暨南大学基础医学院,广东广州,510632;广州大学地理科学学院;广东省疾病预防控制中心广东省公共卫生研究院;广东省疾病预防控制中心
基金项目:国家重点研发计划(2018YFB0505500,2018YFB0505503); 广东省科技计划项目(2014A040401041); 国家自然科学基金(81773497)
摘    要:目的 基于随机森林回归模型构建小空间尺度的登革热风险评估工具,为登革热防控提供依据。方法 以2012年1月至2014年9月登革热病例及相关因素数据为训练集,分别构建登革热流行频率、持续时间及强度风险指标的随机森林回归模型,以2014年10月至2015年12月登革热病例及相关因素数据为验证集,并对构建的模型进行评估。结果 频率、持续时间、强度指标与发病数指标的相关系数均>0.7。依据训练集构建的登革热流行频率、持续时间和强度风险指标的随机森林回归模型变量解释度分别为96.72%、91.98%和90.1%,提示模型拟合度较好;交叉验证法可见各模型均方误差分别0.001 9、1.424 6和1.881 1,均处于较低水平;比较随机森林回归、支持向量回归、广义线性模型和广义相加模型的准确性,随机森林回归和支持向量机等机器学习模型均方误差远低于广义线性模型和广义相加模型。结论 以登革热频率、持续时间及强度指标为结局变量,气象、环境及社会经济特征为预测变量构建的随机森林回归模型准确性较好,可作为登革热风险评估工具,为登革热防控工作服务。

关 键 词:登革热  随机森林回归  风险评估
收稿时间:2018-09-12

Risk assessment of dengue fever based on random forest model
HUANG Yu-lin,ZHAO Yong-qian,CAO Zheng,LIU Tao,DENG Ai-ping,XIAO Jian-peng,ZHANG Bing,ZHU Guang-hu,PENG Zhi-qiang,MA Wen-jun.Risk assessment of dengue fever based on random forest model[J].South China JOurnal of Preventive Medicine,2019,45(1):26-31.
Authors:HUANG Yu-lin  ZHAO Yong-qian  CAO Zheng  LIU Tao  DENG Ai-ping  XIAO Jian-peng  ZHANG Bing  ZHU Guang-hu  PENG Zhi-qiang  MA Wen-jun
Institution:1.Jinan University Faculty of Medical Science, Guangzhou 510632, China; 2.School of Geographical Sciences, Guangzhou University; 3.Guangdong Provincial Institute of Public Health,Guangdong Provincial Center for Disease Control and Prevention; 4.Guangdong Provincial Center for Disease Control and Prevention
Abstract:Objective To construct a small spatial scale dengue risk assessment tool based on the random forest model,so as to provide scientific basis for the prevention and control of dengue fever. Methods Data of dengue case and related factors from February 2012 to September 2014 were used as the training set and random forest regression (RFR) models were constructed separately for frequency, duration and intensity of dengue fever. Data of dengue cases and related factors from October 2014 to March 2015 were used to as the testing set to verify the accuracy of the models. Results The correlation coefficients between incidence and frequency, duration, intensity of dengue fever were all higher than 0.7. Based on the training set, the pseudo R-squareds in the models of frequency, duration, and intensity were 96.72%, 91.98%, and 90.1%; the cross-validated mean square errors (MSEs) of the models were 0.001 9, 1.424 6, and 1.881 1, respectively. By comparing the accuracy of RFR, support vector regression (SVR), generalized linear model (GLM) and generalized additive model (GAM), the MSEs of RFR and SVR were much lower than those of GLM and GAM. Conclusion The RFR models constructed using the frequency, duration and intensity of dengue fever as outcome variables and the meteorological, environmental and socioeconomic characteristics as predictors have better accuracy and can be used as a risk assessment tool for preventing and control of the outbreak of dengue fever.
Keywords:Dengue  Random forest regression  Risk assessment  
本文献已被 万方数据 等数据库收录!
点击此处可从《华南预防医学》浏览原始摘要信息
点击此处可从《华南预防医学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号