A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification

Zhipeng Wang, Qiuming Zhu

Research output: Contribution to journalArticlepeer-review

Abstract

Feature selection is a process of finding a meaningful subset of attributes from a given set of measurements for a purpose of revealing a coherent relation or causality in an event. The process is often indispensable to facilitate an effective pattern classification. It is usually a preprocessing step before constructing a machine learning model in big data analytics for improving the accuracy of predictive results. By selecting the most significant features, it could reduce the time of training and the complexity of the model, avoid data overfitting, and help user to better understand the source data and the modeling outcomes. Though features are commonly dealt with in continuous values, many features appear to be binary valued, i.e., either 1 or 0, in many real-world machine learning applications. Inspired by existing feature selection methods, a new framework called FMC_SELECTOR was presented in this paper which addresses specifically the selection of significant features of binary valued attributes from highly imbalanced large datasets. The FMC_SELECTOR combines the fisher linear discriminant analysis with a cross-entropy mechanism to create an integrated mapping function for evaluating each individual features from a given dataset. A new formulization called Mapping Based Cross-Entropy Evaluation (MCE) was derived for a quantitative ranking of the features. A Positive Case Prediction Score (PPS) is explored to verify the significance of the features selected in a classification process. The performance of FMC_SELECTOR is compared with two popular feature selection methods – the Univariate Importance (UI) and Recursive Feature Elimination (RFM), and shows a better performance on the datasets tested

Original languageEnglish (US)
Pages (from-to)226-238
Number of pages13
JournalInternational Journal of Computer Information Systems and Industrial Management Applications
Volume14
StatePublished - 2022

Keywords

  • Binary features
  • Cross entropy
  • Feature selection
  • Model verification
  • Pattern classification

ASJC Scopus subject areas

  • Management Information Systems
  • Signal Processing
  • Information Systems
  • Computer Vision and Pattern Recognition
  • Strategy and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification'. Together they form a unique fingerprint.

Cite this