Abstract: Clustering problems often involve datasets
where only a part of the data is relevant to
the problem, e.g., in microarray data analysis
only a subset of the genes show cohesive
expressions within a subset of the conditions/
features. The existence of a large
number of non-informative data points and
features makes it challenging to hunt for coherent
and meaningful clusters from such
datasets. Additionally, since clusters could
exist in different subspaces of the feature
space, a co-clustering algorithm that simultaneously
clusters objects and features is often
more suitable as compared to one that
is restricted to traditional “one-sided” clustering.
We propose Robust Overlapping Co-
Clustering (ROCC), a scalable and very versatile
framework that addresses the problem
of efficiently mining dense, arbitrarily positioned,
possibly overlapping co-clusters from
large, noisy datasets. ROCC has several desirable
properties that make it extremely well
suited to a number of real life applications.
- Topics:
- Co-Clustering
Download: pdf
Citation
- A Scalable Framework for Discovering Coherent Co-clusters in Noisy Data (pdf, software)
M. Deodhar, G. Gupta, J. Ghosh, H. Cho, I. Dhillon.
In International Conference on Machine Learning (ICML), June 2009.
Bibtex: