首页 | 本学科首页   官方微博 | 高级检索  
     

基于人工智能的中文临床试验筛选标准文本分类研究
引用本文:宗辉,张泽宇,杨金璇,雷健波,李作峰,郝天永,张晓艳. 基于人工智能的中文临床试验筛选标准文本分类研究[J]. 生物医学工程学杂志, 2021, 0(1): 105-110,121
作者姓名:宗辉  张泽宇  杨金璇  雷健波  李作峰  郝天永  张晓艳
作者单位:同济大学生命科学与技术学院;北京大学医学信息学中心;飞利浦中国研究院;华南师范大学计算机学院
基金项目:国家自然科学基金项目(81972914)。
摘    要:受试者招募是影响临床试验进展和结果的关键环节,一般通过筛选标准(包括纳入标准和排除标准)进行招募。筛选标准的语义类别研究可以优化临床试验设计和促进受试者自动筛选系统开发。本文通过学术测评的方式探究了利用人工智能技术对中文临床试验筛选标准语义类别的自动分类问题。本文收集了38341条带语义标注的中文筛选标准文本,并预先定义了44种语义类别。总共有75支队伍报名参加测评,其中27支队伍提交了结果。结果分析发现大部分参赛队伍使用了混合模型,主流的方法是引入能提供丰富语义表示的预训练语言模型,结合神经网络模型,针对分类任务进行微调,最后进行模型集成提高最终性能。研究结果显示,最佳系统的性能达到0.81的宏平均F1值,其主要是使用了基于预训练语言模型——变换器双向编码表征模型(BERT)与模型融合的方法。结果错误分析显示,从数据处理步骤来看,数据的预处理和后处理非常重要;从语料数量来看,数量较少类别的分类效果一般。通过本文研究,最终期望本文研究成果能为中文临床试验筛选标准短文本分类研究领域提供可供研究的数据集和最新结果。

关 键 词:临床试验  筛选标准  文本分类  人工智能  自然语言处理

Artificial intelligence based Chinese clinical trials eligibility criteria classification
ZONG Hui,ZHANG Zeyu,YANG Jinxuan,LEI Jianbo,LI Zuofeng,HAO Tianyong,ZHANG Xiaoyan. Artificial intelligence based Chinese clinical trials eligibility criteria classification[J]. Journal of biomedical engineering, 2021, 0(1): 105-110,121
Authors:ZONG Hui  ZHANG Zeyu  YANG Jinxuan  LEI Jianbo  LI Zuofeng  HAO Tianyong  ZHANG Xiaoyan
Affiliation:(School of Life Sciences and Technology,Tongji University,Shanghai 200092,P.R.China;Center for Medical Informatics,Peking University,Beijing 100080,P.R.China;Philips Research China,Shanghai 200072,P.R.China;School of Computer Science,South China Normal University,Guangzhou 510631,P.R.China)
Abstract:Subject recruitment is a key component that affects the progress and results of clinical trials,and generally conducted with eligibility criteria(includes inclusion criteria and exclusion criteria).The semantic category analysis of eligibility criteria can help optimizing clinical trials design and building automated patient recruitment system.This study explored the automatic semantic categories classification of Chinese eligibility criteria based on artificial intelligence by academic shared task.We totally collected 38341 annotated eligibility criteria sentences and predefined 44 semantic categories.A total of 75 teams participated in competition,with 27 teams having submitted system outputs.Based on the results,we found out that most teams adopted mixed models.The mainstream resolution was applying pretrained language models capable of providing rich semantic representation,which were combined with neural network models and used to fine-tune the models with reference to classifier tasks,and finally improved classification performance could be obtained by ensemble modeling.The best-performing system achieved a macro F1 score of 0.81 by using a pretrained language model,i.e.bidirectional encoder representations from transformers(BERT)and ensemble modeling.With the error analysis we found out that from the point of data processing steps the data pre-processing and postprocessing were very important for classification,while from the point of data volume these categories with less data volume showed lower classification performance.Finally,we hope that this study could provide a valuable dataset and state-of-the-art result for the research of Chinese medical short text classification.
Keywords:clinical trial  eligibility criteria  text classification  artificial intelligence  natural language processing
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号