Largescale Multilabel Learning
Project Summary
Multilabel classification problems abound in practice; as a result, many methods have recently been proposed for these problems. However, there are two key challenges that have not been adequately addressed: (a) the number of labels can be numerous, for example, in the millions, and (b) the test data can be riddled with missing labels. In this project, we study a generic framework for multilabel classification that directly addresses the above challenges. In particular, we pose the problem as one of empirical risk minimization, where the prediction function is parameterized by a lowrank matrix. A key facet of our approach is that we handle missing labels in the training set by applying techniques from the domain of matrix completion. To develop a scalable algorithm that can handle a larger number of classes, we use the alternating minimization method to find the lowrank parameter matrix. Finally, we present empirical results on a variety of benchmark datasets and show that our methods perform significantly better than existing label compression based methods.
Project Description
Due to several motivating reallife applications, recent research on multilabel classification has largely shifted its focus to the other end of the spectrum where the number of labels is assumed to be extremely large, with the key challenge being the design of scalable algorithms that offer realtime predictions and have a small memory footprint. In such situations, simple methods such as 1vsall or Binary Relevance (BR), that treat each label as a separate binary classification problem fail miserably. For a problem with (say) 10 thousands labels and one million features, which is common in several applications, these methods have a memory footprint of around 100 Gigabytes and offer slow predictions.
A common technique that has been used to handle the label proliferation problem in several recent works is “label space reduction”. The key idea in this technique is to reduce the dimensionality of the labelspace by using either random projections or canonical correlation analysis (CCA) based projections.Subsequently, these methods perform prediction on the smaller dimensional labelspace and then recover the original labels by projecting back onto the high dimensional labelspace.
 LEML: Lowrank ERM for MultiLabel Learning
In this project we take a more direct approach by formulating the problem as that of learning a lowrank linear model Z. We cast this learning problem in the standard ERM framework that allows us to use a variety of loss functions and regularizations for Z. This framework unifies several existing dimension reduction approaches.
 LEML: Matrix Factorization with Side Features
To learn the LEML model efficiently, we consider the decomposition form of Z=WH^T and apply the alternating minimization techniques to obtain W and H. Unlike the traditional matrix factorization, updating W is more challenging. Based on the structure of the ERM problem, we develop efficient Hessianvector multiplication routines such that iterative methods such as conjugate gradient can be applied to solve the subproblem efficiently. In fact, from this point of view, LEML can be regarded as Matrix Factorization with row features as follows.

Experimental Results: We perform extensive experiments on the following datasets.
The following empirical results show that LEML outperforms other stateoftheart methods in various settings:
Main Publications
 Largescale Multilabel Learning with Missing Labels (pdf, software)
H. Yu, P. Jain, P. Kar, I. Dhillon.
In International Conference on Machine Learning (ICML), pp. 593–601, June 2014.