Assessment of approximate string matching in a biomedical text retrieval problem期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Assessment of approximate string matching in a biomedical text retrieval problem

Authors:	Wang J F Li Z R Cai C Z Chen Y Z

Affiliation:	Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore.

Abstract:	Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine literatures are matched with those from medicinal chemistry literatures by using this algorithm at different string identity levels (80-100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith-Waterman algorithm is useful for improving the success rate of biomedical text retrieval.

Keywords:	Bioinformatics Biomedical Data mining Dynamic programming Herb Herbal medicine Literature Literature search Medicine Medicinal plant Medinformatics Plant Smith-Waterman algorithm Text Text matching Word Word matching
本文献已被 ScienceDirect PubMed 等数据库收录！