Gray-box techniques for adversarial text generation

Prithviraj Dasgupta, Joseph Collins, Anna Buhman

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


We consider the problem of adversarial text generation in the context of cyber-security tasks such as email spam filtering and text classification for sentiment analysis on social media sites. In adversarial text generation, an adversary attempts to perturb valid text data to generate adversarial text such that the adversarial text ends up getting mis-classified by a machine classifier. Many existing techniques for perturbing text data use gradient-based or white-box methods, where the adversary observes the gradient of the loss function from the classifier for a given input sample, and uses this information to strategically select portions of the text to perturb. On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the classifier and has to probe the classifier with different input samples to generate successful adversarial samples, have been used less often for generating adversarial text. In this paper, we integrate black-box methods where the adversary has a limited budget of the number of probes to the classifier, with white-box, gradient-based methods, and evaluate the effectiveness of the adversarially generated text in misleading a deep network classifier model.

Original languageEnglish (US)
Pages (from-to)17-23
Number of pages7
JournalCEUR Workshop Proceedings
StatePublished - 2018
Event2018 AAAI Symposium on Adversary-Aware Learning Techniques and Trends in Cybersecurity, ALEC 2018 - Arlington, United States
Duration: Oct 18 2018Oct 20 2018

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Gray-box techniques for adversarial text generation'. Together they form a unique fingerprint.

Cite this