Genetic algorithm classifier system for semi-supervised learning

L. Dee Miller, Leen Kiat Soh, Stephen Scott

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

Real-world datasets often contain large numbers of unlabeled data points, because there is additional cost for obtaining the labels. Semi-supervised learning (SSL) algorithms use both labeled and unlabeled data points for training that can result in higher classification accuracy on these datasets. Generally, traditional SSLs tentatively label the unlabeled data points on the basis of the smoothness assumption that neighboring points should have the same label. When this assumption is violated, unlabeled points are mislabeled injecting noise into the final classifier. An alternative SSL approach is cluster-then-label (CTL), which partitions all the data points (labeled and unlabeled) into clusters and creates a classifier by using those clusters. CTL is based on the less restrictive cluster assumption that data points in the same cluster should have the same label. As shown, this allows CTLs to achieve higher classification accuracy on many datasets where the cluster assumption holds for the CTLs, but smoothness does not hold for the traditional SSLs. However, cluster configuration problems (e.g., irrelevant features, insufficient clusters, and incorrectly shaped clusters) could violate the cluster assumption. We propose a new framework for CTLs by using a genetic algorithm (GA) to evolve classifiers without the cluster configuration problems (e.g., the GA removes irrelevant attributes, updates number of clusters, and changes the shape of the clusters). We demonstrate that a CTL based on this framework achieves comparable or higher accuracy with both traditional SSLs and CTLs on 12 University of California, Irvine machine learning datasets.

Original languageEnglish (US)
Pages (from-to)201-232
Number of pages32
JournalComputational Intelligence
Volume31
Issue number2
DOIs
StatePublished - May 1 2015

Keywords

  • cluster-then-label
  • genetic algorithm
  • semi-supervised learning
  • unsupervised clustering

ASJC Scopus subject areas

  • Computational Mathematics
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Genetic algorithm classifier system for semi-supervised learning'. Together they form a unique fingerprint.

  • Cite this