Saved in:
Bibliographic Details
Main Authors: Zhang, Hongwei, Xu, Xiaoyin, An, Dongsheng, Gu, Xianfeng, Zhang, Min
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.07463
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914710930063360
author Zhang, Hongwei
Xu, Xiaoyin
An, Dongsheng
Gu, Xianfeng
Zhang, Min
author_facet Zhang, Hongwei
Xu, Xiaoyin
An, Dongsheng
Gu, Xianfeng
Zhang, Min
contents Backdoor attacks become a significant security concern for deep neural networks in recent years. An image classification model can be compromised if malicious backdoors are injected into it. This corruption will cause the model to function normally on clean images but predict a specific target label when triggers are present. Previous research can be categorized into two genres: poisoning a portion of the dataset with triggered images for users to train the model from scratch, or training a backdoored model alongside a triggered image generator. Both approaches require significant amount of attackable parameters for optimization to establish a connection between the trigger and the target label, which may raise suspicions as more people become aware of the existence of backdoor attacks. In this paper, we propose a backdoor attack paradigm that only requires minimal alterations (specifically, the output layer) to a clean model in order to inject the backdoor under the guise of fine-tuning. To achieve this, we leverage mode mixture samples, which are located between different modes in latent space, and introduce a novel method for conducting backdoor attacks. We evaluate the effectiveness of our method on four popular benchmark datasets: MNIST, CIFAR-10, GTSRB, and TinyImageNet.
format Preprint
id arxiv_https___arxiv_org_abs_2403_07463
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Backdoor Attack with Mode Mixture Latent Modification
Zhang, Hongwei
Xu, Xiaoyin
An, Dongsheng
Gu, Xianfeng
Zhang, Min
Cryptography and Security
Computer Vision and Pattern Recognition
Backdoor attacks become a significant security concern for deep neural networks in recent years. An image classification model can be compromised if malicious backdoors are injected into it. This corruption will cause the model to function normally on clean images but predict a specific target label when triggers are present. Previous research can be categorized into two genres: poisoning a portion of the dataset with triggered images for users to train the model from scratch, or training a backdoored model alongside a triggered image generator. Both approaches require significant amount of attackable parameters for optimization to establish a connection between the trigger and the target label, which may raise suspicions as more people become aware of the existence of backdoor attacks. In this paper, we propose a backdoor attack paradigm that only requires minimal alterations (specifically, the output layer) to a clean model in order to inject the backdoor under the guise of fine-tuning. To achieve this, we leverage mode mixture samples, which are located between different modes in latent space, and introduce a novel method for conducting backdoor attacks. We evaluate the effectiveness of our method on four popular benchmark datasets: MNIST, CIFAR-10, GTSRB, and TinyImageNet.
title Backdoor Attack with Mode Mixture Latent Modification
topic Cryptography and Security
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2403.07463