A Bayesian approach to cluster validation

Hoyt A. Koepke, Bertrand Clarke

Research output: Contribution to conferencePaper

Abstract

In this paper, we propose a novel approach to validating clusterings. We treat a given clustering as a baseline and define a collection of perturbations of it that give possibly different assignment of points to clusters. If these are indexed by a hyperparameter, integrating with respect to a prior gives an averaged assignment matrix. This matrix can be visualized as a heat map, allowing clusterings and their stability properties to be readily seen. The difference between an averaged assignment matrix and the baseline gives a measure of the stability of the baseline. This approach motivates a general and computationally fast algorithm for evaluating the stability of distance-based and exponential-model type clusterings, including k-means. In addition, these criteria can be used to choose the optimal number of clusters. Our method compares favorably with data based perturbation procedures, such as subsampling, in some conditions such as small sample size. In addition, there is evidence that our method performs better relative to subsampling methods on some problems.

Original languageEnglish (US)
Pages9P
Publication statusPublished - Dec 1 2008
Event10th International Symposium on Artificial Intelligence and Mathematics, ISAIM 2008 - Fort Lauderdale, FL, United States
Duration: Jan 2 2008Jan 4 2008

Conference

Conference10th International Symposium on Artificial Intelligence and Mathematics, ISAIM 2008
CountryUnited States
CityFort Lauderdale, FL
Period1/2/081/4/08

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Applied Mathematics

Cite this

Koepke, H. A., & Clarke, B. (2008). A Bayesian approach to cluster validation. 9P. Paper presented at 10th International Symposium on Artificial Intelligence and Mathematics, ISAIM 2008, Fort Lauderdale, FL, United States.