Diametrical Clustering for identifying Anti-Correlated Gene Clusters

Inderjit Dhillon, Edward Marcotte, Usman Roshan

Abstract:   Motivation: Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive—genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. Results: We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i) re-partitioning the genes and (ii) computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a ‘diametric’ cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

Download: pdf

Citation

  • Diametrical Clustering for identifying Anti-Correlated Gene Clusters (pdf, software)
    I. Dhillon, E. Marcotte, U. Roshan.
    University of Texas Computer Science Technical Report (UTCS Technical Report) TR-02-49, September 2002.

    Bibtex: