Saved in:
Bibliographic Details
Main Authors: Vu, Kiana, Lai, Phung, Nguyen, Truc
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.08919
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929499635974144
author Vu, Kiana
Lai, Phung
Nguyen, Truc
author_facet Vu, Kiana
Lai, Phung
Nguyen, Truc
contents Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a "golden sample" of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the original samples information is replaced. This flexibility effectively balances a trade-off between the attacks effectiveness and its stealthiness. XSub is also highly cost-effective in that the number of required queries to the prediction model and the explanation model in conducting the attack is in O(1). In addition, XSub can be easily extended to launch backdoor attacks in case the attacker has access to the models training data. Our evaluation demonstrates that XSub is not only effective and stealthy but also cost-effective, enabling its application across a wide range of AI models.
format Preprint
id arxiv_https___arxiv_org_abs_2409_08919
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution
Vu, Kiana
Lai, Phung
Nguyen, Truc
Machine Learning
Artificial Intelligence
Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a "golden sample" of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the original samples information is replaced. This flexibility effectively balances a trade-off between the attacks effectiveness and its stealthiness. XSub is also highly cost-effective in that the number of required queries to the prediction model and the explanation model in conducting the attack is in O(1). In addition, XSub can be easily extended to launch backdoor attacks in case the attacker has access to the models training data. Our evaluation demonstrates that XSub is not only effective and stealthy but also cost-effective, enabling its application across a wide range of AI models.
title XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2409.08919