Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Xixu, Zheng, Runkai, Wang, Jindong, Leung, Cheuk Hang, Wu, Qi, Xie, Xing
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2402.03317
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913429441216512
author	Hu, Xixu Zheng, Runkai Wang, Jindong Leung, Cheuk Hang Wu, Qi Xie, Xing
author_facet	Hu, Xixu Zheng, Runkai Wang, Jindong Leung, Cheuk Hang Wu, Qi Xie, Xing
contents	Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_03317
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization Hu, Xixu Zheng, Runkai Wang, Jindong Leung, Cheuk Hang Wu, Qi Xie, Xing Computer Vision and Pattern Recognition Machine Learning Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. Existing methods lack a solid theoretical basis, focusing mainly on empirical training adjustments. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings. We establish local Lipschitz bounds for the self-attention layer and propose the Maximum Singular Value Penalization (MSVP) to precisely manage these bounds By incorporating MSVP into ViTs' attention layers, we enhance the model's robustness without compromising training efficiency. SpecFormer, the resulting model, outperforms other state-of-the-art models in defending against adversarial attacks, as proven by experiments on CIFAR and ImageNet datasets. Code is released at https://github.com/microsoft/robustlearn.
title	SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2402.03317

Similar Items