首页 | 本学科首页   官方微博 | 高级检索  
检索        

基于Group-Lasso天麻品质形成关键因子的分析
引用本文:王红洁,王科,余水祥,马云桐.基于Group-Lasso天麻品质形成关键因子的分析[J].中草药,2023,54(13):4278-4285.
作者姓名:王红洁  王科  余水祥  马云桐
作者单位:桂林理工大学理学院, 广西 桂林 541000;成都工业学院大数据与人工智能学院, 四川 成都 611730;成都中医药大学药学院, 四川 成都 610075
基金项目:四川省科技厅重点研发项目:川产地道药材大品种精深加工关键技术及产品开发的研究与示范(2020YFN0152);川产道地药材品质评价关键技术装备研究(2021YFS0045)
摘    要:目的 为提高人工种植天麻的质量,基于Group-Lasso变量筛选构建随机森林回归模型分析影响天麻品质形成的关键因子。方法 基于Group-Lasso法,对2007—2022年天麻质量研究文献中天麻素含量及产地环境变量等数据进行变量筛选,并在筛选出的变量基础上建立随机森林回归模型及计算变量重要性得分。结果 最终选择了产区、生长状况、种质类型、产地气候类型、产地土壤类型、最热月均温、产地年降水量、产地年日照时数和无霜期9个变量,基于被选变量与天麻素含量建立随机森林回归模型,模型的均方误差(mean square error,MSE)和平均绝对百分误差(mean absolute percentage error,MAPE)分别为0.103 2和14.08%,特征重要性排序显示天麻素含量的最大影响因素是产地年降水量,其次是产地土壤类型、无霜期和产地年日照时数。结论 随机森林回归模型有相对较低的误差和较高的预估精度,更适合用于对天麻种植环境的分析和天麻素含量的估算,为人工种植天麻提供参考。

关 键 词:天麻  天麻素  Group-Lasso  变量筛选  随机森林回归  变量重要性评分
收稿时间:2023/2/23 0:00:00

Analysis of key factors in Gastrodia elata quality formation based on Group-Lasso
WANG Hong-jie,WANG Ke,YU Shui-xiang,MA Yun-tong.Analysis of key factors in Gastrodia elata quality formation based on Group-Lasso[J].Chinese Traditional and Herbal Drugs,2023,54(13):4278-4285.
Authors:WANG Hong-jie  WANG Ke  YU Shui-xiang  MA Yun-tong
Institution:College of Science, Guilin University of Technology, Guilin 541000, China;School of Big Data and Artificial Intelligence, Chengdu Technological University, Chengdu 611730, China; College of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 610075, China
Abstract:Objective In order to improve the quality of artificially planted Tianma (Gastrodia elata), a random forest regression model based on Group-Lasso variable screening was constructed to analyze the key factors affecting the quality of G. elata. Methods Based on the Group-Lasso method, the data of gastrodin content and environmental variables of origin in the literature of G. elata quality research from 2007 to 2022 were screened, and the random forest regression model was then established on the selected variables, and importance score of the variables was calculated. Results Finally, nine variables including production area, growth status, species, production area climate type, production area soil type, average temperature in the hottest month, annual precipitation in the production area, annual sunshine hours in the production area, and frost-free period were selected. A random forest regression model was established based on the selected variables and gastrodin content. The mean square error (MSE) and mean absolute percentage error (MAPE) were 0.103 2 and 14.08%, respectively. The ranking of feature importance showed that the biggest influencing factor of gastrodin content was the annual precipitation in the production area, followed by the production area soil type, frost-free period, and annual sunshine hours in the production area. Conclusion The random forest regression model had relatively low error and high prediction accuracy, and was more suitable for the analysis of G. planting environment and the estimation of gastrodin content.
Keywords:Gastrodia elata Bl    gastrodin  Group-Lasso  variable screening  random forest regression  variable importance measures
点击此处可从《中草药》浏览原始摘要信息
点击此处可从《中草药》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号