Saved in:
Bibliographic Details
Main Authors: Hu, Xixu, Zheng, Runkai, Wang, Jindong, Leung, Cheuk Hang, Wu, Qi, Xie, Xing
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03317
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913429441216512
author Hu, Xixu
Zheng, Runkai
Wang, Jindong
Leung, Cheuk Hang
Wu, Qi
Xie, Xing
author_facet Hu, Xixu
Zheng, Runkai
Wang, Jindong
Leung, Cheuk Hang
Wu, Qi
Xie, Xing
contents Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03317
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Hu, Xixu
Zheng, Runkai
Wang, Jindong
Leung, Cheuk Hang
Wu, Qi
Xie, Xing
Computer Vision and Pattern Recognition
Machine Learning
Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn.
title SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2402.03317