首页 | 本学科首页   官方微博 | 高级检索  
     


Induction of comprehensible models for gene expression datasets by subgroup discovery methodology
Authors:Gamberger Dragan  Lavrac Nada  Zelezný Filip  Tolar Jakub
Affiliation:Laboratory for Information Systems, Rudjer Boskovi? Institute, Zagreb, Croatia. dragan.gamberger@irb.hr
Abstract:Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号