首页 | 本学科首页   官方微博 | 高级检索  
     

TcmYiAnBERT:基于无监督学习的中医医案预训练模型
引用本文:胡为,刘伟,盛威,卢彦杰,石玉敬. TcmYiAnBERT:基于无监督学习的中医医案预训练模型[J]. 医学信息学杂志, 2023, 44(7): 63-67
作者姓名:胡为  刘伟  盛威  卢彦杰  石玉敬
作者单位:湖南中医药大学信息科学与工程学院 长沙 410013
基金项目:湖南省自然科学基金项目(项目编号:2022JJ30438);湖南中医药大学校级自然科学基金项目(项目编号:2022XJZKC016);湖南省教育厅科学研究项目(项目编号:20C1435)。
摘    要:目的/意义 充分挖掘中医医案中的文本信息,提高中医药信息化程度和中医医案症状术语抽取、关系抽取等下游任务的准确率。方法/过程 通过光学字符识别和爬虫技术获取大量中医医案数据并进行预处理,构建面向中医医案领域预训练数据集,使用BERT模型预训练方法,经过多轮训练得到首个面向中医领域专有预训练模型TcmYiAnBERT,并将该模型开源。结果/结论 中医领域专有预训练模型TcmYiAnBERT在中医命名实体识别任务中比未使用该模型的预训练模型F1值提高2.8个百分点。

关 键 词:中医医案  预训练模型  TcmYiAnBERT  命名实体识别  人工智能
修稿时间:2022-12-24

TcmYiAnBERT:A Traditional Chinese Medicine Case Pre-training Model Based on Unsupervised Learning
HU Wei,LIU Wei,SHENG Wei,LU Yanjie,SHI Yujing. TcmYiAnBERT:A Traditional Chinese Medicine Case Pre-training Model Based on Unsupervised Learning[J]. Journal of Medical Informatics, 2023, 44(7): 63-67
Authors:HU Wei  LIU Wei  SHENG Wei  LU Yanjie  SHI Yujing
Affiliation:School of Informatics,Hunan University of Chinese Medicine, Changsha 410013, China
Abstract:Purpose/Significance To fully mine the text information in traditional Chinese medicine (TCM) medical records, to improve the degree of TCM informatization, and to improve the accuracy of downstream tasks such as symptom term extraction and relationship extraction in TCM medical records.Method/Process A large number of TCM medical case data are obtained through optical character recognition (OCR) technology and crawler technology, and data preprocessing is carried out. A pre-training data set for TCM medical case field is constructed. The first proprietary pre-training model, namely TcmYiAnBERT, for TCM field is obtained through multiple rounds of training by using the BERT model pre-training method, and the model is open source. Result/Conclusion The experiment shows that the recognition accuracy of TCM domain specific pre-training model TcmYiAnBERT in the task of TCM named entity recognition (NER) is 2.8 percentage points higher than that of other pre-training models.
Keywords:
点击此处可从《医学信息学杂志》浏览原始摘要信息
点击此处可从《医学信息学杂志》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号