Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Roth, Tom, Unanue, Inigo Jauregi, Abuadbba, Alsharif, Piccardi, Massimo
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2401.08255
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917568426541056
author	Roth, Tom Unanue, Inigo Jauregi Abuadbba, Alsharif Piccardi, Massimo
author_facet	Roth, Tom Unanue, Inigo Jauregi Abuadbba, Alsharif Piccardi, Massimo
contents	Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is monolingual and cannot be used to target multilingual victim models, a significant limitation given the increased use of these models. For this reason, in this work we propose an approach to fine-tune a multilingual paraphrase model with an adversarial objective so that it becomes able to generate effective adversarial examples against multilingual classifiers. The training objective incorporates a set of pre-trained models to ensure text quality and language consistency of the generated text. In addition, all the models are suitably connected to the generator by vocabulary-mapping matrices, allowing for full end-to-end differentiability of the overall training pipeline. The experimental validation over two multilingual datasets and five languages has shown the effectiveness of the proposed approach compared to existing baselines, particularly in terms of query efficiency. We also provide a detailed analysis of the generated attacks and discuss limitations and opportunities for future research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_08255
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A Generative Adversarial Attack for Multilingual Text Classifiers Roth, Tom Unanue, Inigo Jauregi Abuadbba, Alsharif Piccardi, Massimo Computation and Language Artificial Intelligence Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is monolingual and cannot be used to target multilingual victim models, a significant limitation given the increased use of these models. For this reason, in this work we propose an approach to fine-tune a multilingual paraphrase model with an adversarial objective so that it becomes able to generate effective adversarial examples against multilingual classifiers. The training objective incorporates a set of pre-trained models to ensure text quality and language consistency of the generated text. In addition, all the models are suitably connected to the generator by vocabulary-mapping matrices, allowing for full end-to-end differentiability of the overall training pipeline. The experimental validation over two multilingual datasets and five languages has shown the effectiveness of the proposed approach compared to existing baselines, particularly in terms of query efficiency. We also provide a detailed analysis of the generated attacks and discuss limitations and opportunities for future research.
title	A Generative Adversarial Attack for Multilingual Text Classifiers
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2401.08255

Similar Items