Enregistré dans:
| Auteurs principaux: | , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2503.00399 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866915551963512832 |
|---|---|
| author | Song, Juan Yang, Lijie Feng, Mingtao |
| author_facet | Song, Juan Yang, Lijie Feng, Mingtao |
| contents | It remains a significant challenge to compress images at extremely low bitrate while achieving both semantic consistency and high perceptual quality. Inspired by human progressive perception mechanism, we propose a Semantically Disentangled Image Compression framework (SEDIC) in this paper. Initially, an extremely compressed reference image is obtained through a learned image encoder. Then we leverage LMMs to extract essential semantic components, including overall descriptions, object detailed description, and semantic segmentation masks. We propose a training-free Object Restoration model with Attention Guidance (ORAG) built on pre-trained ControlNet to restore object details conditioned by object-level text descriptions and semantic masks. Based on the proposed ORAG, we design a multistage semantic image decoder to progressively restore the details object by object, starting from the extremely compressed reference image, ultimately generating high-quality and high-fidelity reconstructions. Experimental results demonstrate that SEDIC significantly outperforms state-of-the-art approaches, achieving superior perceptual quality and semantic consistency at extremely low-bitrates ($\le$ 0.05 bpp). |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_00399 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective Song, Juan Yang, Lijie Feng, Mingtao Computer Vision and Pattern Recognition It remains a significant challenge to compress images at extremely low bitrate while achieving both semantic consistency and high perceptual quality. Inspired by human progressive perception mechanism, we propose a Semantically Disentangled Image Compression framework (SEDIC) in this paper. Initially, an extremely compressed reference image is obtained through a learned image encoder. Then we leverage LMMs to extract essential semantic components, including overall descriptions, object detailed description, and semantic segmentation masks. We propose a training-free Object Restoration model with Attention Guidance (ORAG) built on pre-trained ControlNet to restore object details conditioned by object-level text descriptions and semantic masks. Based on the proposed ORAG, we design a multistage semantic image decoder to progressively restore the details object by object, starting from the extremely compressed reference image, ultimately generating high-quality and high-fidelity reconstructions. Experimental results demonstrate that SEDIC significantly outperforms state-of-the-art approaches, achieving superior perceptual quality and semantic consistency at extremely low-bitrates ($\le$ 0.05 bpp). |
| title | Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2503.00399 |