Clustering with Multiple Graphs | Center for Big Data Analytics

Abstract: In graph-based learning models, entities are often represented as vertices in an undirected graph with weighted edges describing the relationships between entities. In many real-world applications, however, entities are often associated with relations of different types and/or from different sources, which can be well captured by multiple undirected graphs over the same set of vertices. How to exploit such multiple sources of information to make better inferences on entities remains an interesting open problem. In this paper, we focus on the problem of clustering the vertices based on multiple graphs in both unsupervised and semi-supervised settings. As one of our contributions, we propose Linked Matrix Factorization (LMF) as a novel way of fusing information from multiple graph sources. In LMF, each graph is approximated by matrix factorization with a graph-specific factor and a factor common to all graphs, where the common factor provides features for all vertices. Experiments on SIAM journal data show that (1) we can improve the clustering accuracy through fusing multiple sources of information with several models, and (2) LMF yields superior or competitive results compared to other graph-based clustering methods.

Topics:
Graph Clustering

Download: pdf

Citation

Clustering with Multiple Graphs (pdf, software)
W. Tang, Z. Lu, I. Dhillon.
In IEEE International Conference on Data Mining (ICDM), December 2009.

Bibtex:
@inproceedings{tang2009clustering, author = "Wei Tang AND Zhengdong Lu AND Inderjit S. Dhillon", title = "Clustering with Multiple Graphs", booktitle = "IEEE International Conference on Data Mining (ICDM)", number = "0", year = "2009", month = "dec", abstract = "In graph-based learning models, entities are often represented as vertices in an undirected graph with weighted edges describing the relationships between entities. In many real-world applications, however, entities are often associated with relations of different types and/or from different sources, which can be well captured by multiple undirected graphs over the same set of vertices. How to exploit such multiple sources of information to make better inferences on entities remains an interesting open problem. In this paper, we focus on the problem of clustering the vertices based on multiple graphs in both unsupervised and semi-supervised settings. As one of our contributions, we propose Linked Matrix Factorization (LMF) as a novel way of fusing information from multiple graph sources. In LMF, each graph is approximated by matrix factorization with a graph-specific factor and a factor common to all graphs, where the common factor provides features for all vertices. Experiments on SIAM journal data show that (1) we can improve the clustering accuracy through fusing multiple sources of information with several models, and (2) LMF yields superior or competitive results compared to other graph-based clustering methods." }