Abstract
We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.
Original language | English (US) |
---|---|
Pages (from-to) | 17-23 |
Number of pages | 7 |
Journal | CEUR Workshop Proceedings |
Volume | 2269 |
State | Published - 2018 |
Event | 2018 AAAI Symposium on Adversary-Aware Learning Techniques and Trends in Cybersecurity, ALEC 2018 - Arlington, United States Duration: Oct 18 2018 → Oct 20 2018 |
ASJC Scopus subject areas
- General Computer Science