Abstract: Semi-supervised clustering algorithms aim to improve clustering results using
limited supervision. The supervision is generally given as pairwise constraints; such constraints
are natural for graphs, yet most semi-supervised clustering algorithms are designed
for data represented as vectors. In this paper, we unify vector-based and graph-based approaches.
We first show that a recently-proposed objective function for semi-supervised
clustering based on Hidden Markov Random Fields, with squared Euclidean distance and
a certain class of constraint penalty functions, can be expressed as a special case of the
weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International
Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection
between weighted kernel k-means and several graph clustering objectives enables
us to perform semi-supervised clustering of data given either as vectors or as a graph. For
graph data, this result leads to algorithms for optimizing several new semi-supervised graph
clustering objectives. For vector data, the kernel approach also enables us to find clusters
with non-linear boundaries in the input data space. Furthermore, we show that recent work
on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference
on Artificial Intelligence, 2003) may be viewed as a special case of our formulation.
We empirically show that our algorithm is able to outperform current state-of-the-art semisupervised
algorithms on both vector-based and graph-based data sets.
Download: pdf
Citation
- Semi-Supervised Graph Clustering: A Kernel Approach (pdf, software)
B. Kulis, S. Basu, I. Dhillon, R. Mooney.
Machine Learning 74(1), pp. 1-22, January 2009.
Bibtex: