A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification

Zhipeng Wang, Qiuming Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Feature selection is a process of finding a meaningful subset of attributes from a given set of measurements for a purpose of revealing a coherent relation or causality in an event to facilitate an effective pattern classification. It can be treated as a pre-step before constructing the machine learning models in big data analytics for improving the accuracy of a prediction result. By selecting the most significant features, it will reduce the time of training and the complexity of the model, avoid data overfitting, and help user to better understand the source data and the modeling results. Though features are commonly dealt with in continuous values, many features appear to be binary valued, i.e., either 1 or 0, in many real-world machine learning applications. Inspired by existing feature selection methods, a new framework called FMC_SELECTOR was presented in this paper which addresses specifically the selection of significant features of binary values from highly imbalanced large datasets. The FMC_SELECTOR combines the fisher linear discriminant analysis with a cross-entropy concept to create an integrated mapping function to evaluate each individual features from a given dataset. A new formula called Mapping Based Cross-Entropy Evaluation (MCE) was derived. A Positive Case Prediction Score (PPS) is explored to verify the significance of the features selected in a classification process. The performance of FMC_SELECTOR is compared with two popular feature selection methods – the Univariate Importance (UI) and Recursive Feature Elimination (RFM), and shows a better performance on the datasets tested.

Original languageEnglish (US)
Title of host publicationIntelligent Systems Design and Applications - 21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
EditorsAjith Abraham, Niketa Gandhi, Thomas Hanne, Tzung-Pei Hong, Tatiane Nogueira Rios, Weiping Ding
PublisherSpringer Science and Business Media Deutschland GmbH
Pages1406-1416
Number of pages11
ISBN (Print)9783030963071
DOIs
StatePublished - 2022
Event21st International Conference on Intelligent Systems Design and Applications, ISDA 2021 - Virtual, Online
Duration: Dec 13 2021Dec 15 2021

Publication series

NameLecture Notes in Networks and Systems
Volume418 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
CityVirtual, Online
Period12/13/2112/15/21

Keywords

  • Binary features
  • Classification
  • Cross entropy
  • Feature selection
  • Model verification

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification'. Together they form a unique fingerprint.

Cite this