Fast Classification with Binary Prototypes | Center for Big Data Analytics

Abstract: In this work, we propose a new technique for fast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes. The training phase simultaneously learns a hash function which maps the data points to binary codes, and a set of representative binary prototypes. In the prediction phase, we first hash the query into a binary code and then do the k-NN classification using the binary prototypes as the database. Our approach speeds up k-NN classification in two aspects. First, we compress the database into a smaller set of prototypes such that k-NN search only goes through a smaller set rather than the whole dataset. Second, we reduce the original space to a compact binary embed- ding, where the Hamming distance between two binary codes is very efficient to compute. We propose a formulation to learn the hash function and prototypes such that the classification error is minimized. We also provide a novel theoretical analysis of the proposed technique in terms of Bayes error consistency. Empirically, our method is much faster than the state-of-the-art k-NN compression meth- ods with comparable accuracy.

Topics:
Big Data

Download: pdf

Citation

Fast Classification with Binary Prototypes (pdf, software)
K. Zhong, R. Guo, S. Kumar, B. Yan, D. Simcha, I. Dhillon.
In International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.

Bibtex:
@inproceedings{zhong2017fastclass, author = "Kai Zhong AND Ruiqi Guo AND Sanjiv Kumar AND Bowei Yan AND David Simcha AND Inderjit S. Dhillon", title = "Fast Classification with Binary Prototypes", booktitle = "International Conference on Artificial Intelligence and Statistics (AISTATS)", year = "2017", abstract = "In this work, we propose a new technique for fast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes. The training phase simultaneously learns a hash function which maps the data points to binary codes, and a set of representative binary prototypes. In the prediction phase, we first hash the query into a binary code and then do the k-NN classification using the binary prototypes as the database. Our approach speeds up k-NN classification in two aspects. First, we compress the database into a smaller set of prototypes such that k-NN search only goes through a smaller set rather than the whole dataset. Second, we reduce the original space to a compact binary embed- ding, where the Hamming distance between two binary codes is very efficient to compute. We propose a formulation to learn the hash function and prototypes such that the classification error is minimized. We also provide a novel theoretical analysis of the proposed technique in terms of Bayes error consistency. Empirically, our method is much faster than the state-of-the-art k-NN compression meth- ods with comparable accuracy." }