首页 | 本学科首页   官方微博 | 高级检索  
     


Assessment of approximate string matching in a biomedical text retrieval problem
Authors:Wang J F  Li Z R  Cai C Z  Chen Y Z
Affiliation:Department of Computational Science, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore.
Abstract:Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine literatures are matched with those from medicinal chemistry literatures by using this algorithm at different string identity levels (80-100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith-Waterman algorithm is useful for improving the success rate of biomedical text retrieval.
Keywords:Bioinformatics   Biomedical   Data mining   Dynamic programming   Herb   Herbal medicine   Literature   Literature search   Medicine   Medicinal plant   Medinformatics   Plant   Smith-Waterman algorithm   Text   Text matching   Word   Word matching
本文献已被 ScienceDirect PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号