:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Song, Juan, Yang, Lijie, Feng, Mingtao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.00399
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion
by: Song, Juan, et al.
Published: (2024)

Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
by: Liu, Jinming, et al.
Published: (2024)

LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression
by: Murai, Shimon, et al.
Published: (2024)

Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer
by: Xue, Naifu, et al.
Published: (2024)

FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds
by: Zhang, Xiaoge, et al.
Published: (2025)

SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion
by: Lee, Hyeonjin, et al.
Published: (2025)

Towards image compression with perfect realism at ultra-low bitrates
by: Careil, Marlène, et al.
Published: (2023)

Fine color guidance in diffusion models and its application to image compression at extremely low bitrates
by: Bordin, Tom, et al.
Published: (2024)

Perception Without Engagement: Dissecting the Causal Discovery Deficit in LMMs
by: Liang, Jiafeng, et al.
Published: (2026)

Teaching LMMs for Image Quality Scoring and Interpreting
by: Zhang, Zicheng, et al.
Published: (2025)

All-in-One Transferring Image Compression from Human Perception to Multi-Machine Perception
by: Zhao, Jiancheng, et al.
Published: (2025)

VisualCritic: Making LMMs Perceive Visual Quality Like Humans
by: Huang, Zhipeng, et al.
Published: (2024)

LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs
by: Yang, Woo Yi, et al.
Published: (2025)

Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication
by: Lyu, Hanjia, et al.
Published: (2024)

LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs
by: Xu, Zitong, et al.
Published: (2025)

Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation
by: Yang, Zhichao, et al.
Published: (2025)

MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective
by: Huang, Hailang, et al.
Published: (2024)

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
by: Wang, Jiarui, et al.
Published: (2025)

Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates
by: Xu, Guojun, et al.
Published: (2026)

HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning
by: Wang, Man, et al.
Published: (2026)

MIBench: Evaluating LMMs on Multimodal Interaction
by: Miao, Yu, et al.
Published: (2026)

A Framework for Generating Semantically Ambiguous Images to Probe Human and Machine Perception
by: Hu, Yuqi, et al.
Published: (2026)

ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
by: Xie, Yin, et al.
Published: (2024)

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
by: Zhang, Zicheng, et al.
Published: (2024)

Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning
by: Zheng, Yushuo, et al.
Published: (2026)

Hierarchical Semantic Compression for Consistent Image Semantic Restoration
by: Li, Shengxi, et al.
Published: (2025)

Efficiently Disentangling CLIP for Multi-Object Perception
by: Rawlekar, Samyak, et al.
Published: (2025)

Disentangled Human Body Representation Based on Unsupervised Semantic-Aware Learning
by: Wang, Lu, et al.
Published: (2025)

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
by: Li, Hongxiang, et al.
Published: (2024)

MMSearch-R1: Incentivizing LMMs to Search
by: Wu, Jinming, et al.
Published: (2025)

Visually-Guided Controllable Medical Image Generation via Fine-Grained Semantic Disentanglement
by: Huang, Xin, et al.
Published: (2026)

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
by: Pang, Hui En, et al.
Published: (2024)

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
by: Zhang, Kaichen, et al.
Published: (2024)

Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates
by: Ye, Yixuan, et al.
Published: (2024)

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
by: Meng, Lingchen, et al.
Published: (2024)

Infrared and Visible Image Fusion with Hierarchical Human Perception
by: Yang, Guang, et al.
Published: (2024)

Noise Dimension of GAN: An Image Compression Perspective
by: Zhu, Ziran, et al.
Published: (2024)

UniCoRN: Unified Commented Retrieval Network with LMMs
by: Jaritz, Maximilian, et al.
Published: (2025)

Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
by: Zhang, Zicheng, et al.
Published: (2024)

VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
by: Bharadwaj, Rohit, et al.
Published: (2024)