首页 | 本学科首页   官方微博 | 高级检索  
检索        

利用Nutch设计实现生物医学信息垂直搜索引擎
引用本文:王小磊,李立,赵东升.利用Nutch设计实现生物医学信息垂直搜索引擎[J].北京生物医学工程,2010,29(6):638-640,644.
作者姓名:王小磊  李立  赵东升
作者单位:军事医学科学院卫生勤务与医学情报研究所,北京,100850;军事医学科学院卫生勤务与医学情报研究所,北京,100850;军事医学科学院卫生勤务与医学情报研究所,北京,100850
摘    要:在网络的海量信息搜索过程中,医学情报研究和信息服务机构,经常需要构建面向专题的垂直搜索系统以满足特定人群的需求。本文利用Nutch和Lucene等开源软件设计了一个面向生物医学信息的垂直搜索引擎系统,并对网页信息抓取、格式处理、内容索引和检索等关键技术进行了说明。在此搜索引擎中,通过加入中文分词和增量抓取等模块,提高了中文关键字的识别率,缩短了信息的更新周期。目前该系统已经上线测试,能够获得较为精确和及时的搜索结果。

关 键 词:Nutch  网络信息抓取  Lucene  中文分词  增量抓取

Design and Implementation of Biomedical Information Vertical Search Engine using Nutch Software
WANG Xiaolei,LI Li,ZHAO Dongsheng.Design and Implementation of Biomedical Information Vertical Search Engine using Nutch Software[J].Beijing Biomedical Engineering,2010,29(6):638-640,644.
Authors:WANG Xiaolei  LI Li  ZHAO Dongsheng
Institution:(Institute of Health Service and Medical Information,Academy of Military Medical Sciences, Beijing 100850)
Abstract:In the process of searching useful information from the massive information network, the vertical search system is often used by the information service organizations for medical information research and information service, to meet the specific needs. This paper uses open-source software Nutch and Lucene to design and implement a vertical search engine for biomedical information. Some key techniques such as crawling and processing of web page, content indexing and searching, are explained and discussed. The system improves the recognition rate of Chinese keywords and reduces the information update cycle by adding Chinese word segmentation and re-crawl modules. Currently the system has been tested online and obtained more accurate and timely search resuhs.
Keywords:Nutch  Lucene
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号