A Scalable Framework for Discovering Coherent Co-clusters in Noisy Data

Meghana Deodhar, Gunjan Gupta, Joydeep Ghosh, Hyuk Cho, Inderjit Dhillon

Abstract:   Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/ features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional “one-sided” clustering. We propose Robust Overlapping Co- Clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications.

Download: pdf

Citation

  • A Scalable Framework for Discovering Coherent Co-clusters in Noisy Data (pdf, software)
    M. Deodhar, G. Gupta, J. Ghosh, H. Cho, I. Dhillon.
    In International Conference on Machine Learning (ICML), June 2009.

    Bibtex: