Predictive models with resampling: A comparative study of machine learning algorithms and their performances on handling imbalanced datasets

Adithi D. Chakravarthy, Sindhura Bonthu, Zhengxin Chen, Qiuming Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Class imbalance is a problem of crucial challenge in many real-world machine learning applications. Traditional machine learning algorithms are likely to produce good accuracy scores on such datasets due to an obvious bias towards the majority class. Thus, accuracy as a measure of performance for algorithms working on imbalanced data is not very clearly defined since the classifier has poor predictive accuracy over the minority class. While previous work has used several resampling techniques to aid in improving the predictive accuracy of the minority class, in this study, we explore and compare the effectiveness of the Synthetic Minority Oversampling and Random Oversampling techniques over multiple learning algorithms and resampling ratios for eight different performance measures against two datasets from diverse domains such as medicine and engineering. The results of this study show that the effectiveness of these resampling techniques is a multivariate function relative to both the learning algorithms and the resampling ratios, as well as the coherent characteristics of datasets. The choice of performance measures to evaluate models built using these resampling techniques also vary, thus giving us more relevant information useful for future research and applications.

Original languageEnglish (US)
Title of host publicationProceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
EditorsM. Arif Wani, Taghi M. Khoshgoftaar, Dingding Wang, Huanjing Wang, Naeem Seliya
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1492-1495
Number of pages4
ISBN (Electronic)9781728145495
DOIs
StatePublished - Dec 2019
Event18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019 - Boca Raton, United States
Duration: Dec 16 2019Dec 19 2019

Publication series

NameProceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019

Conference

Conference18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
CountryUnited States
CityBoca Raton
Period12/16/1912/19/19

Keywords

  • Class-imbalance
  • Classification
  • Oversampling
  • Predictive-models
  • Resampling
  • Undersampling

ASJC Scopus subject areas

  • Strategy and Management
  • Artificial Intelligence
  • Computer Science Applications
  • Decision Sciences (miscellaneous)
  • Signal Processing
  • Media Technology

Fingerprint Dive into the research topics of 'Predictive models with resampling: A comparative study of machine learning algorithms and their performances on handling imbalanced datasets'. Together they form a unique fingerprint.

Cite this