Multi-Scale Spectral Decomposition of Massive Graphs

Abstract: Computing the $k$ dominant eigenvalues and eigenvectors of massive graphs is a key operation in numerous machine learning applications; however, popular solvers suffer from slow convergence, especially when $k$ is reasonably large. In this paper, we propose and analyze a novel multi-scale spectral decomposition method (MSEIGS), which first clusters the graph into smaller clusters whose spectral decomposition can be computed efficiently and independently. We show theoretically as well as empirically that the union of all cluster’s subspaces has significant overlap with the dominant subspace of the original graph, provided that the graph is clustered appropriately. Thus, eigenvectors of the clusters serve as good initializations to a block Lanczos algorithm that is used to compute spectral decomposition of the original graph. We further use hierarchical clustering to speed up the computation and adopt a fast early termination strategy to compute quality approximations. Our method outperforms widely used solvers in terms of convergence speed and approximation quality. Furthermore, our method is naturally parallelizable and exhibits significant speedups in shared-memory parallel settings. For example, on a graph with more than 82 million nodes and 3.6 billion edges, MSEIGS takes less than 3 hours on a single-core machine while Randomized SVD takes more than 6 hours, to obtain a similar approximation of the top-50 eigenvectors. Using 16 cores, we can reduce this time to less than 40 minutes.

Download: pdf

Citation

Multi-Scale Spectral Decomposition of Massive Graphs (pdf, software)
S. Si, D. Shin, I. Dhillon, B. Parlett.
In Neural Information Processing Systems (NIPS), pp. 2798-2806, December 2014.

Bibtex:
@inproceedings{si2014multiscal, author = "Si Si AND Donghyuk Shin AND Inderjit S. Dhillon AND Beresford N. Parlett", title = "Multi-Scale Spectral Decomposition of Massive Graphs", booktitle = "Neural Information Processing Systems (NIPS)", page = "2798–2806", year = "2014", month = "dec", abstract = "Computing the $k$ dominant eigenvalues and eigenvectors of massive graphs is a key operation in numerous machine learning applications; however, popular solvers suffer from slow convergence, especially when $k$ is reasonably large. In this paper, we propose and analyze a novel multi-scale spectral decomposition method (MSEIGS), which first clusters the graph into smaller clusters whose spectral decomposition can be computed efficiently and independently. We show theoretically as well as empirically that the union of all cluster{'}s subspaces has significant overlap with the dominant subspace of the original graph, provided that the graph is clustered appropriately. Thus, eigenvectors of the clusters serve as good initializations to a block Lanczos algorithm that is used to compute spectral decomposition of the original graph. We further use hierarchical clustering to speed up the computation and adopt a fast early termination strategy to compute quality approximations. Our method outperforms widely used solvers in terms of convergence speed and approximation quality. Furthermore, our method is naturally parallelizable and exhibits significant speedups in shared-memory parallel settings. For example, on a graph with more than 82 million nodes and 3.6 billion edges, MSEIGS takes less than 3 hours on a single-core machine while Randomized SVD takes more than 6 hours, to obtain a similar approximation of the top-50 eigenvectors. Using 16 cores, we can reduce this time to less than 40 minutes." }

Associated Projects

Divide & Conquer Methods for Big Data Analytics

Center for Big Data Analytics

Multi-Scale Spectral Decomposition of Massive Graphs

Si Si, Donghyuk Shin, Inderjit Dhillon, Beresford Parlett

Download: pdf

Citation

Associated Projects