Matrix Completion from Power-Law Distributed Samples

Abstract: The low-rank matrix completion problem is a fundamental problem with many important applications. Recently, [4],[13] and [5] obtained the first non-trivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most real-world datasets do not satisfy this assumption, but instead exhibit power-law distributed samples. In this paper, we propose a graph theoretic approach to matrix completion that solves the problem for more realistic sampling models. Our method is simpler to analyze than previous methods with the analysis reducing to computing the threshold for complete cascades in random graphs, a problem of independent interest. By analyzing the graph theoretic problem, we show that our method achieves exact recovery when the observed entries are sampled from the Chung-Lu-Vu model, which can generate power-law distributed graphs. We also hypothesize that our algorithm solves the matrix completion problem from an optimal number of entries for the popular preferential attachment model and provide strong empirical evidence for the claim. Furthermore, our method is easy to implement and is substantially faster than existing methods. We demonstrate the effectiveness of our method on random instances where the low-rank matrix is sampled according to the prevalent random graph models for complex networks and present promising preliminary results on the Netflix challenge dataset.

Topics:
Matrix Completion

Download: pdf

Citation

Matrix Completion from Power-Law Distributed Samples (pdf, software)
R. Meka, P. Jain, I. Dhillon.
In Neural Information Processing Systems (NIPS), pp. 1258-1266, December 2009.

Bibtex:
@inproceedings{meka2009matrixcom, author = "Raghu Meka AND Prateek Jain AND Inderjit S. Dhillon", title = "Matrix Completion from Power-Law Distributed Samples", booktitle = "Neural Information Processing Systems (NIPS)", page = "1258–1266", number = "0", year = "2009", month = "dec", abstract = "The low-rank matrix completion problem is a fundamental problem with many important applications. Recently, [4],[13] and [5] obtained the first non-trivial theoretical results for the problem assuming that the observed entries are sampled uniformly at random. Unfortunately, most real-world datasets do not satisfy this assumption, but instead exhibit power-law distributed samples. In this paper, we propose a graph theoretic approach to matrix completion that solves the problem for more realistic sampling models. Our method is simpler to analyze than previous methods with the analysis reducing to computing the threshold for complete cascades in random graphs, a problem of independent interest. By analyzing the graph theoretic problem, we show that our method achieves exact recovery when the observed entries are sampled from the Chung-Lu-Vu model, which can generate power-law distributed graphs. We also hypothesize that our algorithm solves the matrix completion problem from an optimal number of entries for the popular preferential attachment model and provide strong empirical evidence for the claim. Furthermore, our method is easy to implement and is substantially faster than existing methods. We demonstrate the effectiveness of our method on random instances where the low-rank matrix is sampled according to the prevalent random graph models for complex networks and present promising preliminary results on the Netflix challenge dataset." }

Center for Big Data Analytics

Matrix Completion from Power-Law Distributed Samples

Raghu Meka, Prateek Jain, Inderjit Dhillon

Download: pdf

Citation