Feature subspace transformations for enhancing K-means clustering

Anirban Chatterjee, Sanjukta Bhowmick, Padma Raghavan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Unsupervised classification typically concerns identifying clusters of similar entities in an unlabeled dataset. Popular methods include clustering based on (i) distance-based metrics between the entities in the feature space (K-Means), and (ii) combinatorial properties in a weighted graph representation of the dataset (Multilevel K-Means). In this paper, we present a force-directed graph layout based feature subspace transformation (FST) scheme to transform the dataset before the application of K-Means. Our FST-K-Means method utilizes both distance-based and combinatorial attributes of the original dataset to seek improvements in the internal and external quality metrics of unsu-pervised classification. We demonstrate the effectiveness of FST-K-Means in improving classification quality relative to K-Means and Multilevel K-Means (GraClus). The quality of classification is measured by observing internal and external quality metrics on a test suite of datasets. Our results indicate that on average, the internal quality metric (cluster cohesiveness) is 20.2% better than K-Means, and 6.6% better than GraClus. More significantly, FST-K-Means improves the external quality metric (accuracy) of classification on average by 14.9% relative to K-Means and 23.6% relative to GraClus.

Original languageEnglish (US)
Title of host publicationCIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
Pages1801-1804
Number of pages4
DOIs
StatePublished - 2010
Event19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada
Duration: Oct 26 2010Oct 30 2010

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
CountryCanada
CityToronto, ON
Period10/26/1010/30/10

Keywords

  • Feature subspace
  • Graph layout
  • K-means clustering

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint Dive into the research topics of 'Feature subspace transformations for enhancing K-means clustering'. Together they form a unique fingerprint.

Cite this