首页 | 本学科首页   官方微博 | 高级检索  
     

Web日志预处理的Clementine方案
引用本文:郑慧霞,徐硕. Web日志预处理的Clementine方案[J]. 医学信息学杂志, 2009, 30(12): 33-36
作者姓名:郑慧霞  徐硕
作者单位:1. 中国医学科学院医学信息研究所,北京,100005
2. 中国科学技术信息研究所,北京,100038
基金项目:中国医学科学院医学信息研究所基本科研业务费专项"基于Web日志统计的图书馆网站读者行为分析" 
摘    要:利用Clementine完成Web日志预处理数据流的初步构建,实现了数据清洗、用户识别、会话识别、路径补充4大过程,同时具备日志合并、数据审核、规范编码、外部信息关联等辅助功能。实验研究表明,利用Clementine对Web日志进行预处理是完全可行的,这为在该平台上进一步完成挖掘工作奠定了基础,从一定程度上解决了Web日志挖掘与预处理交由不同工具处理的困境,提高了Web日志挖掘的自动化程度。

关 键 词:Clementine  Web日志预处理  数据流
收稿时间:2009-06-23

The Clementine Solution for Web Log Preprocessing
ZHENG Hui-xi,XU Shuo. The Clementine Solution for Web Log Preprocessing[J]. Journal of Medical Informatics, 2009, 30(12): 33-36
Authors:ZHENG Hui-xi  XU Shuo
Affiliation:Institute of Medical Information, Chinese Academy of Medical Sciences;Institute of Scientific and Technical Information of China
Abstract:The paper introduces the preliminary structuring of preprocessing data stream for web log by Clementine, which implements the following procedures: data cleaning, user identification, session identification and path complementary, etc. In addition, it also provides some auxiliary functions, such as log merging, data auditing, coding specification, associating with external information, etc. Experimental result indicates that web log preprocessing based on Clementine is completely feasible, which lays a foundation for further log mining on the same platform. To some extent, it resolves the problem that web log mining and preprocessing are treated by different tools, thus improving the degree of automation for web log mining.
Keywords:Clementine
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《医学信息学杂志》浏览原始摘要信息
点击此处可从《医学信息学杂志》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号