Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Jindong, Fu, Yali, Liu, Jiahong, Cao, Linxiao, Ji, Wei, Yang, Menglin, King, Irwin, Yang, Ming-Hsuan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.22920
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915418181992448
author	Li, Jindong Fu, Yali Liu, Jiahong Cao, Linxiao Ji, Wei Yang, Menglin King, Irwin Yang, Ming-Hsuan
author_facet	Li, Jindong Fu, Yali Liu, Jiahong Cao, Linxiao Ji, Wei Yang, Menglin King, Irwin Yang, Ming-Hsuan
contents	The rapid advancement of large language models (LLMs) has intensified the need for effective mechanisms to transform continuous multimodal data into discrete representations suitable for language-based processing. Discrete tokenization, with vector quantization (VQ) as a central approach, offers both computational efficiency and compatibility with LLM architectures. Despite its growing importance, there is a lack of a comprehensive survey that systematically examines VQ techniques in the context of LLM-based systems. This work fills this gap by presenting the first structured taxonomy and analysis of discrete tokenization methods designed for LLMs. We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines. Beyond algorithm-level investigation, we discuss existing research in terms of classical applications without LLMs, LLM-based single-modality systems, and LLM-based multimodal systems, highlighting how quantization strategies influence alignment, reasoning, and generation performance. In addition, we identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints. Finally, we discuss emerging research directions such as dynamic and task-adaptive quantization, unified tokenization frameworks, and biologically inspired codebook learning. This survey bridges the gap between traditional vector quantization and modern LLM applications, serving as a foundational reference for the development of efficient and generalizable multimodal systems. A continuously updated version is available at: https://github.com/jindongli-Ai/LLM-Discrete-Tokenization-Survey.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_22920
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey Li, Jindong Fu, Yali Liu, Jiahong Cao, Linxiao Ji, Wei Yang, Menglin King, Irwin Yang, Ming-Hsuan Computation and Language Artificial Intelligence The rapid advancement of large language models (LLMs) has intensified the need for effective mechanisms to transform continuous multimodal data into discrete representations suitable for language-based processing. Discrete tokenization, with vector quantization (VQ) as a central approach, offers both computational efficiency and compatibility with LLM architectures. Despite its growing importance, there is a lack of a comprehensive survey that systematically examines VQ techniques in the context of LLM-based systems. This work fills this gap by presenting the first structured taxonomy and analysis of discrete tokenization methods designed for LLMs. We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines. Beyond algorithm-level investigation, we discuss existing research in terms of classical applications without LLMs, LLM-based single-modality systems, and LLM-based multimodal systems, highlighting how quantization strategies influence alignment, reasoning, and generation performance. In addition, we identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints. Finally, we discuss emerging research directions such as dynamic and task-adaptive quantization, unified tokenization frameworks, and biologically inspired codebook learning. This survey bridges the gap between traditional vector quantization and modern LLM applications, serving as a foundational reference for the development of efficient and generalizable multimodal systems. A continuously updated version is available at: https://github.com/jindongli-Ai/LLM-Discrete-Tokenization-Survey.
title	Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2507.22920

Similar Items