Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dai, Chengjie, Song, Tiantian, Tang, Hui, Chen, Fangdong, Yang, Bowei, Song, Guanghua
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.12923
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908324502437888
author	Dai, Chengjie Song, Tiantian Tang, Hui Chen, Fangdong Yang, Bowei Song, Guanghua
author_facet	Dai, Chengjie Song, Tiantian Tang, Hui Chen, Fangdong Yang, Bowei Song, Guanghua
contents	In recent years, image compression for high-level vision tasks has attracted considerable attention from researchers. Given that object information in images plays a far more crucial role in downstream tasks than background information, some studies have proposed semantically structuring the bitstream to selectively transmit and reconstruct only the information required by these tasks. However, such methods structure the bitstream after encoding, meaning that the coding process still relies on the entire image, even though much of the encoded information will not be transmitted. This leads to redundant computations. Traditional image compression methods require a two-dimensional image as input, and even if the unimportant regions of the image are set to zero by applying a semantic mask, these regions still participate in subsequent computations as part of the image. To address such limitations, we propose an image compression method based on a position-indexed self-attention mechanism that encodes and decodes only the visible parts of the masked image. Compared to existing semantic-structured compression methods, our approach can significantly reduce computational costs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_12923
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Efficient Masked Image Compression with Position-Indexed Self-Attention Dai, Chengjie Song, Tiantian Tang, Hui Chen, Fangdong Yang, Bowei Song, Guanghua Computer Vision and Pattern Recognition In recent years, image compression for high-level vision tasks has attracted considerable attention from researchers. Given that object information in images plays a far more crucial role in downstream tasks than background information, some studies have proposed semantically structuring the bitstream to selectively transmit and reconstruct only the information required by these tasks. However, such methods structure the bitstream after encoding, meaning that the coding process still relies on the entire image, even though much of the encoded information will not be transmitted. This leads to redundant computations. Traditional image compression methods require a two-dimensional image as input, and even if the unimportant regions of the image are set to zero by applying a semantic mask, these regions still participate in subsequent computations as part of the image. To address such limitations, we propose an image compression method based on a position-indexed self-attention mechanism that encodes and decodes only the visible parts of the masked image. Compared to existing semantic-structured compression methods, our approach can significantly reduce computational costs.
title	Efficient Masked Image Compression with Position-Indexed Self-Attention
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.12923

Similar Items