Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Zhengxin, He, Xiaohai, Zhang, Tingrong, Xiong, Shuhua, Ren, Chao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.00744
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908681979822080
author	Chen, Zhengxin He, Xiaohai Zhang, Tingrong Xiong, Shuhua Ren, Chao
author_facet	Chen, Zhengxin He, Xiaohai Zhang, Tingrong Xiong, Shuhua Ren, Chao
contents	Recently, learned image compression methods have made remarkable achievements, some of which have outperformed the traditional image codec VVC. The advantages of learned image compression methods over traditional image codecs can be largely attributed to their powerful nonlinear transform coding. Convolutional layers and shifted window transformer (Swin-T) blocks are the basic units of neural networks, and their representation capabilities play an important role in nonlinear transform coding. In this paper, to improve the ability of the vanilla convolution to extract local features, we propose a novel prior-guided convolution (PGConv), where asymmetric convolutions (AConvs) and difference convolutions (DConvs) are introduced to strengthen skeleton elements and extract high-frequency information, respectively. A re-parameterization strategy is also used to reduce the computational complexity of PGConv. Moreover, to improve the ability of the Swin-T block to extract non-local features, we propose a novel multi-scale gated transformer (MGT), where dilated window-based multi-head self-attention blocks with different dilation rates and depth-wise convolution layers with different kernel sizes are used to extract multi-scale features, and a gate mechanism is introduced to enhance non-linearity. Finally, we propose a novel joint Multi-scale Gated Transformer and Prior-guided Convolutional Network (MGTPCN) for learned image compression. Experimental results show that our MGTPCN surpasses state-of-the-art algorithms with a better trade-off between performance and complexity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_00744
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Joint Multi-scale Gated Transformer and Prior-guided Convolutional Network for Learned Image Compression Chen, Zhengxin He, Xiaohai Zhang, Tingrong Xiong, Shuhua Ren, Chao Computer Vision and Pattern Recognition Recently, learned image compression methods have made remarkable achievements, some of which have outperformed the traditional image codec VVC. The advantages of learned image compression methods over traditional image codecs can be largely attributed to their powerful nonlinear transform coding. Convolutional layers and shifted window transformer (Swin-T) blocks are the basic units of neural networks, and their representation capabilities play an important role in nonlinear transform coding. In this paper, to improve the ability of the vanilla convolution to extract local features, we propose a novel prior-guided convolution (PGConv), where asymmetric convolutions (AConvs) and difference convolutions (DConvs) are introduced to strengthen skeleton elements and extract high-frequency information, respectively. A re-parameterization strategy is also used to reduce the computational complexity of PGConv. Moreover, to improve the ability of the Swin-T block to extract non-local features, we propose a novel multi-scale gated transformer (MGT), where dilated window-based multi-head self-attention blocks with different dilation rates and depth-wise convolution layers with different kernel sizes are used to extract multi-scale features, and a gate mechanism is introduced to enhance non-linearity. Finally, we propose a novel joint Multi-scale Gated Transformer and Prior-guided Convolutional Network (MGTPCN) for learned image compression. Experimental results show that our MGTPCN surpasses state-of-the-art algorithms with a better trade-off between performance and complexity.
title	Joint Multi-scale Gated Transformer and Prior-guided Convolutional Network for Learned Image Compression
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.00744

Similar Items