首页 | 本学科首页   官方微博 | 高级检索  
检索        


Automated comparative auditing of NCIT genomic roles using NCBI
Authors:Cohen Barry  Oren Marc  Min Hua  Perl Yehoshua  Halper Michael
Institution:aComputer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA;bFox Chase Cancer Center, Philadelphia, PA 19111, USA;cComputer Science Department, Kean University, Union, NJ 07083, USA
Abstract:Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT’s Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information’s (NCBI’s) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.
Keywords:NCI Thesaurus  NCBI GenBank  NCBI Entrez Gene  Gene hierarchy  Biological Process hierarchy  Gene terminology  Automated auditing  Web crawler  Biomedical knowledge base
本文献已被 ScienceDirect PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号