Saved in:
Bibliographic Details
Main Authors: Liu, Jinming, Lin, Junyan, Wei, Yuntao, Shao, Kele, Tao, Keda, Huang, Jianguo, Yang, Xudong, Chen, Zhibo, Wang, Huan, Jin, Xin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.13460
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Classical visual coding and Multimodal Large Language Model (MLLM) token technology share the core objective - maximizing information fidelity while minimizing computational cost. Therefore, this paper reexamines MLLM token technology, including tokenization, token compression, and token reasoning, through the established principles of long-developed visual coding area. From this perspective, we (1) establish a unified formulation bridging token technology and visual coding, enabling a systematic, module-by-module comparative analysis; (2) synthesize bidirectional insights, exploring how visual coding principles can enhance MLLM token techniques' efficiency and robustness, and conversely, how token technology paradigms can inform the design of next-generation semantic visual codecs; (3) prospect for promising future research directions and critical unsolved challenges. In summary, this study presents the first comprehensive and structured technology comparison of MLLM token and visual coding, paving the way for more efficient multimodal models and more powerful visual codecs simultaneously.