Abstract: Clustering problems often involve datasets where only a
part of the data is relevant to the problem, e.g., in microarray
data analysis only a subset of the genes show cohesive
expressions within a subset of the conditions/features. The
existence of a large number of non-informative data points
and features makes it challenging to hunt for coherent and
meaningful clusters from such datasets. Additionally, since
clusters could exist in different subspaces of the feature
space, a co-clustering algorithm that simultaneously clusters
objects and features is often more suitable as compared
to one that is restricted to traditional “one-sided”
clustering. We propose Robust Overlapping Co-clustering
(ROCC), a scalable and very versatile framework that addresses
the problem of efficiently mining dense, arbitrarily
positioned, possibly overlapping co-clusters from large,
noisy datasets. ROCC has several desirable properties that
make it extremely well suited to a number of real life applications.
Through extensive experimentation we show that
our approach is significantly more accurate in identifying
biologically meaningful co-clusters in microarray data as
compared to several other prominent approaches that have
been applied to this task. We also point out other interesting
applications of the proposed framework in solving difficult
clustering problems.
- Topics:
- Co-Clustering
Download: pdf
Citation
- Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets (pdf, software)
M. Deodhar, H. Cho, G. Gupta, J. Ghosh, I. Dhillon.
In IEEE International Conference on Data Mining (ICDM), December 2008.
Bibtex: