Class Visualization of High-Dimensional Data with Applications

Inderjit Dhillon, Dharmendra Modha, W. Spangler

Abstract:   The problem of visualizing high-dimensional data that has been categorized into various classes is considered. The goal in visualizing is to quickly absorb inter-class and intra-class relationships. Towards this end, class-preserving projections of the multidimensional data onto two-dimensional planes, which can be displayed on a computer screen, are introduced. These class-preserving projections maintain the high-dimensional class structure, and are closely related to Fisher’s linear discriminants. By displaying sequences of such two-dimensional projections and by moving continuously from one projection to the next, an illusion of smooth motion through a multidimensional display can be created. Such sequences are called class tours. Furthermore, class-similarity graphs are overlaid on the two-dimensional projections to capture the distance relationships in the original high-dimensional space. The above visualization tools are illustrated on the classical Iris plant data, the ISOLET spoken letter data, and the PENDIGITS on-line handwriting data set. It is shown how the visual examination of the data can uncover latent class relationships.

Download: pdf

Citation

  • Class Visualization of High-Dimensional Data with Applications (pdf, software)
    I. Dhillon, D. Modha, W. Spangler.
    Computational Statistics & Data Analysis (Special issue on Matrix Computations & Statistics) 4(1), pp. 59-90, November 2002.

    Bibtex: