Abstract: The multi-label classification problem has generated significant interest in
recent years. However, existing approaches do not adequately address two key
challenges: (a) scaling up to problems with a large number (say
millions) of labels, and (b) handling data with missing labels. In
this paper, we directly address both these problems by studying the
multi-label problem in a generic empirical risk minimization (ERM) framework.
Our framework, despite being simple, is surprisingly able to encompass several
recent label-compression based methods which can be derived as special cases
of our method. To optimize the ERM problem, we develop techniques that exploit
the structure of specific loss functions – such as the squared loss function –
to obtain efficient algorithms. We further show that
our learning framework admits excess risk bounds even in the presence
of missing labels. Our bounds are tight and demonstrate better
generalization performance for low-rank promoting trace-norm regularization
when compared to (rank insensitive) Frobenius norm regularization. Finally, we
present extensive empirical results on a variety of benchmark datasets and
show that our methods perform significantly better than existing label
compression based methods and can scale up to very large datasets such as a
Wikipedia dataset that has more than 200,000 labels.
Download: pdf, software
Citation
- Large-scale Multi-label Learning with Missing Labels (pdf, software)
H. Yu, P. Jain, P. Kar, I. Dhillon.
In International Conference on Machine Learning (ICML), pp. 593–601, June 2014.
Bibtex: