首页 | 本学科首页   官方微博 | 高级检索  
检索        


Kernel learning at the first level of inference
Institution:1. School of Computer Science and Technology, Xian University of Posts & Telecommunications, 710121, China;2. MOEKLINNS Lab, Department of Computer Science and Technology, Xian Jiaotong University, 710049, China;1. School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;2. Information Security Center, State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China;3. Zhejiang Provincial Key Lab of Data Storage and Transmission Technology, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China;1. Department of Applied Mathematics, Shanghai University of Finance and Economics, Shanghai, 200433, PR China;2. School of Computer Sciences, Key Laboratory of Nonlinear Mathematics Science, School of Mathematical Sciences, Fudan University, Shanghai, 200433, PR China;1. College of Sciences, China University of Mining and Technology, Xuzhou, 221116, China;2. School of Automation, Huazhong University of Science and Technology, Wuhan, 430074, China;1. School of Computer Science and Technology, Soochow University, Suzhou 215006, PR China;2. Department of Electrical and Computer Engineering, National University of Singapore, Singapore;3. Department of Electronic Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Abstract:Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense.
Keywords:Kernel methods  Model selection  Regularisation  Over-fitting  Automatic relevance determination
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号