Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ye, Hancheng, Yu, Chong, Ye, Peng, Xia, Renqiu, Tang, Yansong, Lu, Jiwen, Chen, Tao, Zhang, Bo
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.15835
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917621129019392
author	Ye, Hancheng Yu, Chong Ye, Peng Xia, Renqiu Tang, Yansong Lu, Jiwen Chen, Tao Zhang, Bo
author_facet	Ye, Hancheng Yu, Chong Ye, Peng Xia, Renqiu Tang, Yansong Lu, Jiwen Chen, Tao Zhang, Bo
contents	Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint. Such a separate evaluation process induces the gap between importance and sparsity score distributions, thus causing high search costs for VTC. In this work, for the first time, we investigate how to integrate the evaluations of importance and sparsity scores into a single stage, searching the optimal subnets in an efficient manner. Specifically, we present OFB, a cost-efficient approach that simultaneously evaluates both importance and sparsity scores, termed Once for Both (OFB), for VTC. First, a bi-mask scheme is developed by entangling the importance score and the differentiable sparsity score to jointly determine the pruning potential (prunability) of each unit. Such a bi-mask search strategy is further used together with a proposed adaptive one-hot loss to realize the progressive-and-efficient search for the most important subnet. Finally, Progressive Masked Image Modeling (PMIM) is proposed to regularize the feature space to be more representative during the search process, which may be degraded by the dimension reduction. Extensive experiments demonstrate that OFB can achieve superior compression performance over state-of-the-art searching-based and pruning-based methods under various Vision Transformer architectures, meanwhile promoting search efficiency significantly, e.g., costing one GPU search day for the compression of DeiT-S on ImageNet-1K.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_15835
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression Ye, Hancheng Yu, Chong Ye, Peng Xia, Renqiu Tang, Yansong Lu, Jiwen Chen, Tao Zhang, Bo Computer Vision and Pattern Recognition Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint. Such a separate evaluation process induces the gap between importance and sparsity score distributions, thus causing high search costs for VTC. In this work, for the first time, we investigate how to integrate the evaluations of importance and sparsity scores into a single stage, searching the optimal subnets in an efficient manner. Specifically, we present OFB, a cost-efficient approach that simultaneously evaluates both importance and sparsity scores, termed Once for Both (OFB), for VTC. First, a bi-mask scheme is developed by entangling the importance score and the differentiable sparsity score to jointly determine the pruning potential (prunability) of each unit. Such a bi-mask search strategy is further used together with a proposed adaptive one-hot loss to realize the progressive-and-efficient search for the most important subnet. Finally, Progressive Masked Image Modeling (PMIM) is proposed to regularize the feature space to be more representative during the search process, which may be degraded by the dimension reduction. Extensive experiments demonstrate that OFB can achieve superior compression performance over state-of-the-art searching-based and pruning-based methods under various Vision Transformer architectures, meanwhile promoting search efficiency significantly, e.g., costing one GPU search day for the compression of DeiT-S on ImageNet-1K.
title	Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.15835

Similar Items