Abstract: Feature selection is a basic step in the construction of a vector space or bag of words model [BB99]. In particular, when the processing task is to partition a given document collection into clusters of similar documents a choice of good features along with good clustering algorithms is of paramount importance. This chapter suggests two techniques for feature or term selection along with a number of clustering strategies. The selection techniques significantly reduce the dimension of the vector space model. Examples that illustrate the effectiveness of the proposed algorithms are provided.
- Topics:
- Data Clustering
Download: pdf
Citation
- Feature Selection and Document Clustering (pdf, software)
I. Dhillon, J. Kogan, M. Nicholas.
A Comprehensive Survey of Text Mining, pp. 73-100, January 2003.
Bibtex: