Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jin, Zhi, Qiu, Yuwei, Zhang, Kaihao, Li, Hongdong, Luo, Wenhan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.04486
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915241709797376
author	Jin, Zhi Qiu, Yuwei Zhang, Kaihao Li, Hongdong Luo, Wenhan
author_facet	Jin, Zhi Qiu, Yuwei Zhang, Kaihao Li, Hongdong Luo, Wenhan
contents	Recently, Transformer networks have demonstrated outstanding performance in the field of image restoration due to the global receptive field and adaptability to input. However, the quadratic computational complexity of Softmax-attention poses a significant limitation on its extensive application in image restoration tasks, particularly for high-resolution images. To tackle this challenge, we propose a novel variant of the Transformer. This variant leverages the Taylor expansion to approximate the Softmax-attention and utilizes the concept of norm-preserving mapping to approximate the remainder of the first-order Taylor expansion, resulting in a linear computational complexity. Moreover, we introduce a multi-branch architecture featuring multi-scale patch embedding into the proposed Transformer, which has four distinct advantages: 1) various sizes of the receptive field; 2) multi-level semantic information; 3) flexible shapes of the receptive field; 4) accelerated training and inference speed. Hence, the proposed model, named the second version of Taylor formula expansion-based Transformer (for short MB-TaylorFormer V2) has the capability to concurrently process coarse-to-fine features, capture long-distance pixel interactions with limited computational cost, and improve the approximation of the Taylor expansion remainder. Experimental results across diverse image restoration benchmarks demonstrate that MB-TaylorFormer V2 achieves state-of-the-art performance in multiple image restoration tasks, such as image dehazing, deraining, desnowing, motion deblurring, and denoising, with very little computational overhead. The source code is available at https://github.com/FVL2020/MB-TaylorFormerV2.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_04486
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration Jin, Zhi Qiu, Yuwei Zhang, Kaihao Li, Hongdong Luo, Wenhan Computer Vision and Pattern Recognition Recently, Transformer networks have demonstrated outstanding performance in the field of image restoration due to the global receptive field and adaptability to input. However, the quadratic computational complexity of Softmax-attention poses a significant limitation on its extensive application in image restoration tasks, particularly for high-resolution images. To tackle this challenge, we propose a novel variant of the Transformer. This variant leverages the Taylor expansion to approximate the Softmax-attention and utilizes the concept of norm-preserving mapping to approximate the remainder of the first-order Taylor expansion, resulting in a linear computational complexity. Moreover, we introduce a multi-branch architecture featuring multi-scale patch embedding into the proposed Transformer, which has four distinct advantages: 1) various sizes of the receptive field; 2) multi-level semantic information; 3) flexible shapes of the receptive field; 4) accelerated training and inference speed. Hence, the proposed model, named the second version of Taylor formula expansion-based Transformer (for short MB-TaylorFormer V2) has the capability to concurrently process coarse-to-fine features, capture long-distance pixel interactions with limited computational cost, and improve the approximation of the Taylor expansion remainder. Experimental results across diverse image restoration benchmarks demonstrate that MB-TaylorFormer V2 achieves state-of-the-art performance in multiple image restoration tasks, such as image dehazing, deraining, desnowing, motion deblurring, and denoising, with very little computational overhead. The source code is available at https://github.com/FVL2020/MB-TaylorFormerV2.
title	MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2501.04486

Similar Items