A Fast Kernel-based Multilevel Algorithm for Graph Clustering

Abstract: Graph clustering (also called graph partitioning) — clustering the nodes of a graph — is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a general cut objective has been shown to be mathematically equivalent to an appropriate weighted kernel k-means objective function. In this paper, we exploit this equivalence to develop a very fast multilevel algorithm for graph clustering. Multilevel approaches involve coarsening, initial partitioning and refinement phases, all of which may be specialized to different graph clustering objectives. Unlike existing multilevel clustering approaches, such as METIS, our algorithm does not constrain the cluster sizes to be nearly equal. Our approach gives a theoretical guarantee that the refinement step decreases the graph cut objective under consideration. Experiments show that we achieve better final objective function values as compared to a state-of-the-art spectral clustering algorithm: on a series of benchmark test graphs with up to thirty thousand nodes and one million edges, our algorithm achieves lower normalized cut values in 67% of our experiments and higher ratio association values in 100% of our experiments. Furthermore, on large graphs, our algorithm is significantly faster than spectral methods. Finally, our algorithm requires far less memory than spectral methods; we cluster a 1.2 million node movie network into 5000 clusters, which due to memory requirements cannot be done directly with spectral methods.

Download: pdf, software

Citation

A Fast Kernel-based Multilevel Algorithm for Graph Clustering (pdf, software)
I. Dhillon, B. Kulis.
In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 629-634, August 2005.

Bibtex:
@inproceedings{dhillon2005afastker, author = "Inderjit S. Dhillon AND Brian J. Kulis", title = "A Fast Kernel-based Multilevel Algorithm for Graph Clustering", booktitle = "ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)", page = "629–634", year = "2005", month = "aug", abstract = "Graph clustering (also called graph partitioning) — clustering the nodes of a graph — is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a general cut objective has been shown to be mathematically equivalent to an appropriate weighted kernel k-means objective function. In this paper, we exploit this equivalence to develop a very fast multilevel algorithm for graph clustering. Multilevel approaches involve coarsening, initial partitioning and refinement phases, all of which may be specialized to different graph clustering objectives. Unlike existing multilevel clustering approaches, such as METIS, our algorithm does not constrain the cluster sizes to be nearly equal. Our approach gives a theoretical guarantee that the refinement step decreases the graph cut objective under consideration. Experiments show that we achieve better final objective function values as compared to a state-of-the-art spectral clustering algorithm: on a series of benchmark test graphs with up to thirty thousand nodes and one million edges, our algorithm achieves lower normalized cut values in 67% of our experiments and higher ratio association values in 100% of our experiments. Furthermore, on large graphs, our algorithm is significantly faster than spectral methods. Finally, our algorithm requires far less memory than spectral methods; we cluster a 1.2 million node movie network into 5000 clusters, which due to memory requirements cannot be done directly with spectral methods." }

Software

Graclus

Center for Big Data Analytics

A Fast Kernel-based Multilevel Algorithm for Graph Clustering

Inderjit Dhillon, Brian Kulis

Download: pdf, software

Citation

Software