TY - GEN
T1 - A Cross-Entropy Based Feature Selection Method for Binary Valued Data Classification
AU - Wang, Zhipeng
AU - Zhu, Qiuming
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Feature selection is a process of finding a meaningful subset of attributes from a given set of measurements for a purpose of revealing a coherent relation or causality in an event to facilitate an effective pattern classification. It can be treated as a pre-step before constructing the machine learning models in big data analytics for improving the accuracy of a prediction result. By selecting the most significant features, it will reduce the time of training and the complexity of the model, avoid data overfitting, and help user to better understand the source data and the modeling results. Though features are commonly dealt with in continuous values, many features appear to be binary valued, i.e., either 1 or 0, in many real-world machine learning applications. Inspired by existing feature selection methods, a new framework called FMC_SELECTOR was presented in this paper which addresses specifically the selection of significant features of binary values from highly imbalanced large datasets. The FMC_SELECTOR combines the fisher linear discriminant analysis with a cross-entropy concept to create an integrated mapping function to evaluate each individual features from a given dataset. A new formula called Mapping Based Cross-Entropy Evaluation (MCE) was derived. A Positive Case Prediction Score (PPS) is explored to verify the significance of the features selected in a classification process. The performance of FMC_SELECTOR is compared with two popular feature selection methods – the Univariate Importance (UI) and Recursive Feature Elimination (RFM), and shows a better performance on the datasets tested.
AB - Feature selection is a process of finding a meaningful subset of attributes from a given set of measurements for a purpose of revealing a coherent relation or causality in an event to facilitate an effective pattern classification. It can be treated as a pre-step before constructing the machine learning models in big data analytics for improving the accuracy of a prediction result. By selecting the most significant features, it will reduce the time of training and the complexity of the model, avoid data overfitting, and help user to better understand the source data and the modeling results. Though features are commonly dealt with in continuous values, many features appear to be binary valued, i.e., either 1 or 0, in many real-world machine learning applications. Inspired by existing feature selection methods, a new framework called FMC_SELECTOR was presented in this paper which addresses specifically the selection of significant features of binary values from highly imbalanced large datasets. The FMC_SELECTOR combines the fisher linear discriminant analysis with a cross-entropy concept to create an integrated mapping function to evaluate each individual features from a given dataset. A new formula called Mapping Based Cross-Entropy Evaluation (MCE) was derived. A Positive Case Prediction Score (PPS) is explored to verify the significance of the features selected in a classification process. The performance of FMC_SELECTOR is compared with two popular feature selection methods – the Univariate Importance (UI) and Recursive Feature Elimination (RFM), and shows a better performance on the datasets tested.
KW - Binary features
KW - Classification
KW - Cross entropy
KW - Feature selection
KW - Model verification
UR - http://www.scopus.com/inward/record.url?scp=85127662812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127662812&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-96308-8_130
DO - 10.1007/978-3-030-96308-8_130
M3 - Conference contribution
AN - SCOPUS:85127662812
SN - 9783030963071
T3 - Lecture Notes in Networks and Systems
SP - 1406
EP - 1416
BT - Intelligent Systems Design and Applications - 21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
A2 - Abraham, Ajith
A2 - Gandhi, Niketa
A2 - Hanne, Thomas
A2 - Hong, Tzung-Pei
A2 - Nogueira Rios, Tatiane
A2 - Ding, Weiping
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
Y2 - 13 December 2021 through 15 December 2021
ER -