A new dataset evaluation method based on category overlap |
| |
Authors: | Oh Sejong |
| |
Institution: | Department of Nanobiomedical Science, Dankook University, Cheonan 330-714, Republic of Korea |
| |
Abstract: | The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. In this paper, we propose a new dataset evaluation method using the R-value measure. This proposed method is based on the ratio of overlapping areas among categories in a dataset. A high R-value for a dataset indicates that the dataset contains wide overlapping areas among its categories, and classification accuracy on the dataset may become low. We can use the R-value measure to understand the characteristics of a dataset, the feature selection process, and the proper design of new classifiers. |
| |
Keywords: | Feature Feature selection R-value Dataset Classification Machine learning algorithm |
本文献已被 ScienceDirect PubMed 等数据库收录! |
|