Information-Theoretic Metric Learning | Center for Big Data Analytics

Abstract: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem—that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets.

Download: pdf

Citation

Information-Theoretic Metric Learning (pdf, software)
J. Davis, B. Kulis, P. Jain, S. Sra, I. Dhillon.
In International Conference on Machine Learning (ICML), June 2007.

Bibtex:
@inproceedings{davis2007informatio, author = "Jason V. Davis AND Brian J. Kulis AND Prateek Jain AND Suvrit Sra AND Inderjit S. Dhillon", title = "Information-Theoretic Metric Learning", booktitle = "International Conference on Machine Learning (ICML)", number = "0", year = "2007", month = "jun", abstract = "In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem—that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets." }