首页 | 本学科首页   官方微博 | 高级检索  
检索        

医学文献主题新颖性探测方法对比分析
引用本文:陈斯斯,董立平,许 丹,郭继军.医学文献主题新颖性探测方法对比分析[J].中华医学图书情报杂志,2018,27(2):20-25.
作者姓名:陈斯斯  董立平  许 丹  郭继军
作者单位:中国医科大学图书馆,辽宁 沈阳 110122,中国医科大学图书馆,辽宁 沈阳 110122,中国医科大学图书馆,辽宁 沈阳 110122,中国医科大学图书馆,辽宁 沈阳 110122
摘    要:目的:探讨应用新颖性探测模型评估医学文献主题新颖性的可行性,对比分析2种新颖性探测方法(词重叠法和基于共词的逆文档频率量化法)的优劣。方法:选取生物医学领域8个研究主题,从PubMed数据库收集文献,构建2种新颖性探测模型,结合文献主题新颖性的专家分析结果,利用ROC曲线及AUC值对2种新颖性探测模型的可行性进行评估。结果:词重叠法的新颖度计算结果波动幅度较大,能够更好地将文献内容间差异表现在数据上。基于ROC曲线及AUC值分析,词重叠法对于判断新颖文献具有一定准确性,基于共词特性的逆文档频率量化法对于判断新颖文献准确性较低。结论:两种新颖性探测方法得出的新颖度计算结果呈中度相关,二者的均值差异有统计学意义,前者的表现优于后者。

关 键 词:文献主题  新颖性探测  ROC曲线  可行性分析
收稿时间:2018/1/3 0:00:00

Comparative analysis of subject novelty detection methods in medical literature
CHEN Si-si,DONG Li-ping,XU Dan and GUO Ji-jun.Comparative analysis of subject novelty detection methods in medical literature[J].Chinese Journal of Medical Library and Information Science,2018,27(2):20-25.
Authors:CHEN Si-si  DONG Li-ping  XU Dan and GUO Ji-jun
Institution:Library of China Medical University, Shenyang 110122, Liaoning Province, China,Library of China Medical University, Shenyang 110122, Liaoning Province, China,Library of China Medical University, Shenyang 110122, Liaoning Province, China and Library of China Medical University, Shenyang 110122, Liaoning Province, China
Abstract:Objective To study the feasibility of novelty detection model in assessing the subject novelty of medical literature and comparatively analyze the advantages and disadvantages of words-overlap algorithm and co-words-based inverse file frequency quantitative algorithm. Methods Two novelty detection models were established for the 8 research subjects in PubMed-covered literature. The feasibility of two novelty detection models in assessing the subject novelty of medical literature was assessed according to the subject novelty of literature analyzed by experts, ROC curves and AUC values. Results Words-overlap algorithm showed that the fluctuating amplitude of subject novelty was rather high, which can thus reflect the difference between the contents in literature on the data. ROC curves and AUC values-based analysis revealed a high accuracy of words-overlap algorithm for judging the novelty of literature while co-words-based inverse file frequency quantitative algorithm displayed a low accuracy for judging the novelty of literature. Conclusion The novelty of literature detected with the two novelty detection methods is moderately related. The mean novelty value detected with the two novelty detection methods is of statistical significance. However, the novelty of literature detected with words-overlap algorithm is higher than that detected with co-words-based inverse file frequency quantitative algorithm.
Keywords:Subject of literature  Novelty detection  ROC curve  Feasibility analysis
点击此处可从《中华医学图书情报杂志》浏览原始摘要信息
点击此处可从《中华医学图书情报杂志》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号