Abstract: We consider Multiclass and Multilabel classification with extremely large number of classes. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus reduce the complexity via structural assumption between labels such as low rankness or balanced tree. However, in practice, structural assumption can be easily violated and thus degrades prediction accuracy. In this work, we show that a margin-maximizing loss with L1-penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a variant of Block-Coordinate Frank-Wolfe algorithm that exploits the sparsity to achieve a complexity sublinear to number of primal and dual variables. In our experiments on Extreme Multiclass and Multilabel problems, the proposed method gives superior performance in training time, accuracy and prediction speed.
- A Primal and Dual Sparse Approach to Extreme Classification (pdf, slides, poster, code)
I. Yen, X. Huang, K. Zhong, P. Ravikumar, I. Dhillon.
In International Conference on Machine Learning (ICML), June 2016. (Oral)