TY - JOUR
T1 - The robustness of popular multiclass machine learning models against poisoning attacks
T2 - Lessons and insights
AU - Maabreh, Majdi
AU - Maabreh, Arwa
AU - Qolomany, Basheer
AU - Al-Fuqaha, Ala
N1 - Publisher Copyright:
© The Author(s) 2022.
PY - 2022/7
Y1 - 2022/7
N2 - Despite the encouraging outcomes of machine learning and artificial intelligence applications, the safety of artificial intelligence–based systems is one of the most severe challenges that need further exploration. Data set poisoning is a severe problem that may lead to the corruption of machine learning models. The attacker injects data into the data set that are faulty or mislabeled by flipping the actual labels into the incorrect ones. The word “robustness” refers to a machine learning algorithm’s ability to cope with hostile situations. Here, instead of flipping the labels randomly, we use the clustering approach to choose the training samples for label changes to influence the classifiers’ performance and the distance-based anomaly detection capacity in quarantining the poisoned samples. According to our experiments on a benchmark data set, random label flipping may have a short-term negative impact on the classifier’s accuracy. Yet, an anomaly filter would discover on average 63% of them. On the contrary, the proposed clustering-based flipping might inject dormant poisoned samples until the number of poisoned samples is enough to influence the classifiers’ performance severely; on average, the same anomaly filter would discover 25% of them. We also highlight important lessons and observations during this experiment about the performance and robustness of popular multiclass learners against training data set–poisoning attacks that include: trade-offs, complexity, categories, poisoning resistance, and hyperparameter optimization.
AB - Despite the encouraging outcomes of machine learning and artificial intelligence applications, the safety of artificial intelligence–based systems is one of the most severe challenges that need further exploration. Data set poisoning is a severe problem that may lead to the corruption of machine learning models. The attacker injects data into the data set that are faulty or mislabeled by flipping the actual labels into the incorrect ones. The word “robustness” refers to a machine learning algorithm’s ability to cope with hostile situations. Here, instead of flipping the labels randomly, we use the clustering approach to choose the training samples for label changes to influence the classifiers’ performance and the distance-based anomaly detection capacity in quarantining the poisoned samples. According to our experiments on a benchmark data set, random label flipping may have a short-term negative impact on the classifier’s accuracy. Yet, an anomaly filter would discover on average 63% of them. On the contrary, the proposed clustering-based flipping might inject dormant poisoned samples until the number of poisoned samples is enough to influence the classifiers’ performance severely; on average, the same anomaly filter would discover 25% of them. We also highlight important lessons and observations during this experiment about the performance and robustness of popular multiclass learners against training data set–poisoning attacks that include: trade-offs, complexity, categories, poisoning resistance, and hyperparameter optimization.
KW - Big Data
KW - Poisoning attack
KW - artificial intelligence safety
KW - clustering
KW - deep learning
KW - machine learning
KW - multiclass
UR - http://www.scopus.com/inward/record.url?scp=85134299552&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134299552&partnerID=8YFLogxK
U2 - 10.1177/15501329221105159
DO - 10.1177/15501329221105159
M3 - Article
AN - SCOPUS:85134299552
SN - 1550-1329
VL - 18
JO - International Journal of Distributed Sensor Networks
JF - International Journal of Distributed Sensor Networks
IS - 7
ER -