Abstract: It is a consensus in microarray analysis that identifying potential local patterns, characterized by coherent groups of genes
and conditions, may shed light on the discovery of previously undetectable biological cellular processes of genes, as well as
macroscopic phenotypes of related samples. In order to simultaneously cluster genes and conditions, we have previously developed a
fast coclustering algorithm, Minimum Sum-Squared Residue Coclustering (MSSRCC), which employs an alternating minimization
scheme and generates what we call coclusters in a “checkerboard” structure. In this paper, we propose specific strategies that enable
MSSRCC to escape poor local minima and resolve the degeneracy problem in partitional clustering algorithms. The strategies include
binormalization, deterministic spectral initialization, and incremental local search. We assess the effects of various strategies on both
synthetic gene expression data sets and real human cancer microarrays and provide empirical evidence that MSSRCC with the
proposed strategies performs better than existing coclustering and clustering algorithms. In particular, the combination of all the three
strategies leads to the best performance. Furthermore, we illustrate coherence of the resulting coclusters in a checkerboard structure,
where genes in a cocluster manifest the phenotype structure of corresponding specific samples and evaluate the enrichment of
functional annotations in Gene Ontology (GO).
- Topics:
- Bioinformatics
- Co-Clustering
Download: pdf
Citation
- Co-clustering of Human Cancer Microarrays using Minimum Sum-Squared Residue Co-clustering (pdf, software)
H. Cho, I. Dhillon.
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 5(3), pp. 385-400, July 2008.
Bibtex: