Kernel k-means, Spectral Clustering and Normalized Cuts

Inderjit Dhillon, Yuqiang Guan, Brian Kulis

Abstract:   Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have remained only loosely related. In this paper, we give an explicit theoretical connection between them. We show the generality of the weighted kernel k-means objective function, and derive the spectral clustering objective of normalized cut as a special case. Given a positive definite similarity matrix, our results lead to a novel weighted kernel k-means algorithm that monotonically decreases the normalized cut. This has important implications: a) eigenvector-based algorithms, which can be computationally prohibitive, are not essential for minimizing normalized cuts, b) various techniques, such as local search and acceleration schemes, may be used to improve the quality as well as speed of kernel k-means. Finally, we present results on several interesting data sets, including diametrical clustering of large geneexpression matrices and a handwriting recognition data set.

Download: pdf, software

Citation

  • Kernel k-means, Spectral Clustering and Normalized Cuts (pdf, software)
    I. Dhillon, Y. Guan, B. Kulis.
    In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 551-556, August 2004.
    (A longer version appears as UTCS Technical Report #TR-04-25, June 30, 2004.)

    Bibtex:

Software