首页 | 本学科首页   官方微博 | 高级检索  
     

基于BiLSTM-CRF的中医文本命名实体识别
引用本文:肖瑞,胡冯菊,裴卫. 基于BiLSTM-CRF的中医文本命名实体识别[J]. 世界科学技术-中医药现代化, 2020, 22(7): 2506-2512
作者姓名:肖瑞  胡冯菊  裴卫
作者单位:湖北中医药大学信息工程学院 武汉 430065;湖北中医药大学第一临床学院 武汉 430065
基金项目:年湖北中医药大学“青苗计划”项目[No.2017ZZX016]:基于中医电子病历的慢性乙型肝炎诊断预测算法研究,负责人:肖瑞;国家中医药管理局2018年度中医药法制化建设项目[No.GZY-FJS-2018-162] :互联网虚假违法中医医疗广告监测,负责人:肖瑞。
摘    要:中医药文本命名实体识别在中医药文本挖掘中占有重要地位,本文通过BiLSTM-CRF方法实现对中医医案文本进行命名实体识别,不仅实现了基本命名实体识别,通过对数据集按照中草药、疾病和症状三个类别进行标记,还能够进行命名实体类别识别。对中医药相关医案进行规整的10292条句子进行序列标注,基于word2vec的向量构建,从而进行模型训练迭代,得到了准确率为97.23%,召回率为89.47%,F值为88.34%的中医药命名实体识别模型。各类别识别中,中草药类别识别精准率为94.41%,召回率为94.36%,F值为94.38%;疾病类别精准率为80.92%,召回率为80.92%,F值为80.92%;症状类别精准率为75.68%,召回率为81.68%,F值为78.56%,人工测试模型效果较好,能够对医案数据进行实体识别。命名实体识别模型较多,但用于中医药相关命名实体识别模型数量微乎其微,构建中医药相关命名实体识别模型,将更加有效的推动中医药文本挖掘发展。

关 键 词:文本挖掘  中医药  命名实体  LSTM
收稿时间:2019-05-13
修稿时间:2020-07-01

Chinese Medicine Text Named Entity Recognition Based on BiLSTM-CRF
Xiao Rui,Hu Fengju and Pei Wei. Chinese Medicine Text Named Entity Recognition Based on BiLSTM-CRF[J]. World Science and Technology—Modernization of Traditional Chinese Medicine and Materia Medica, 2020, 22(7): 2506-2512
Authors:Xiao Rui  Hu Fengju  Pei Wei
Affiliation:Hubei University of Chinese Medicine , Wuhan ,430065, China,Hubei University of Chinese Medicine , Wuhan, 430065, China,Hubei University of Chinese Medicine , Wuhan ,430065, China
Abstract:Text named entity recognition of Chinese medicine occupies an important position in text mining of traditional Chinese medicine, this article through the BiLSTM - CRF method was carried out on the basis of traditional Chinese medicine text named entity recognition, not only has realized the basic named entity recognition, based on the data set according to the Chinese herbal medicine, the three categories and symptoms, also can used to identify the named entity classes. Sequence annotation was performed on 10292 sentences of TCM related medical cases, and vector construction was conducted based on word2vec to carry out model training iteration. Thus, a TCM named entity recognition model with accuracy rate of 97.23%, recall rate of 89.47% and F value of 88.34% was obtained. Among all kinds of recognition, the accuracy rate of Chinese herbal medicine category identification is 94.41%, recall rate is 94.36% and F value is 94.38%. The precision rate of disease category was 80.92%, recall rate was 80.92%, and F value was 80.92%. The accuracy rate of the symptom category was 75.68%, the recall rate was 81.68%, and the F value was 78.56%. There are many named entity recognition models, but the number of them used for TCM related named entity recognition is very small. Therefore, the establishment of TCM related named entity recognition model will promote the development of TCM text mining more effectively.
Keywords:Text mining  TCM  Named entity  LSTM
本文献已被 万方数据 等数据库收录!
点击此处可从《世界科学技术-中医药现代化》浏览原始摘要信息
点击此处可从《世界科学技术-中医药现代化》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号