Abstract: The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a low-rank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shift-invariant kernels changes from low-rank to block-diagonal (without any low-rank structure) when varying the
scale parameter. Based on this observation, we propose a new kernel approximation algorithm — Memory Efficient Kernel Approximation, which considers both low-rank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms state-of-the-art low-rank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the MNIST dataset with two-million samples, our method takes 550 seconds on a single machine using less than 500 MBytes memory to achieve 0.2313 test RMSE for kernel ridge regression, while standard Nystr”{o}m approximation takes more than 2700 seconds and uses more than 2 GBytes memory on the same problem to achieve 0.2318 test RMSE.
Download: pdf, slides, code
Citation
- Memory Efficient Kernel Approximation (pdf, slides, software, code)
S. Si, C. Hsieh, I. Dhillon.
In International Conference on Machine Learning (ICML), pp. 701-709, June 2014.
Bibtex: