Saved in:
| Main Authors: | Yang, Sicheng, Hu, Xing, Wu, Qiang, Yang, Dawei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.06863 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
by: Yang, Sicheng, et al.
Published: (2026)
by: Yang, Sicheng, et al.
Published: (2026)
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
by: Wang, Bohan, et al.
Published: (2025)
by: Wang, Bohan, et al.
Published: (2025)
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
by: Li, Duo, et al.
Published: (2025)
by: Li, Duo, et al.
Published: (2025)
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
by: Jin, Yang, et al.
Published: (2023)
by: Jin, Yang, et al.
Published: (2023)
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)
by: Zhang, Luyuan, et al.
Published: (2026)
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
by: Wang, Yuqing, et al.
Published: (2026)
by: Wang, Yuqing, et al.
Published: (2026)
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
by: Wu, Daiqing, et al.
Published: (2025)
by: Wu, Daiqing, et al.
Published: (2025)
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
by: Xie, Tianyu, et al.
Published: (2025)
by: Xie, Tianyu, et al.
Published: (2025)
ST$^3$: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming
by: Zhuang, Jiedong, et al.
Published: (2024)
by: Zhuang, Jiedong, et al.
Published: (2024)
K-Stain: Keypoint-Driven Correspondence for H&E-to-IHC Virtual Staining
by: Yang, Sicheng, et al.
Published: (2025)
by: Yang, Sicheng, et al.
Published: (2025)
Information Entropy Guided Height-aware Histogram for Quantization-friendly Pillar Feature Encoder
by: Zhou, Sifan, et al.
Published: (2024)
by: Zhou, Sifan, et al.
Published: (2024)
On the Role of Discrete Tokenization in Visual Representation Learning
by: Du, Tianqi, et al.
Published: (2024)
by: Du, Tianqi, et al.
Published: (2024)
FlowCoMotion: Text-to-Motion Generation via Token-Latent Flow Modeling
by: Guan, Dawei, et al.
Published: (2026)
by: Guan, Dawei, et al.
Published: (2026)
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
by: Zhuang, Shaobin, et al.
Published: (2025)
by: Zhuang, Shaobin, et al.
Published: (2025)
Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations
by: Liang, Yiwen, et al.
Published: (2025)
by: Liang, Yiwen, et al.
Published: (2025)
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
by: Lou, Haoran, et al.
Published: (2025)
by: Lou, Haoran, et al.
Published: (2025)
When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs
by: Wang, Yahong, et al.
Published: (2025)
by: Wang, Yahong, et al.
Published: (2025)
A Survey of Token Compression for Efficient Multimodal Large Language Models
by: Shao, Kele, et al.
Published: (2025)
by: Shao, Kele, et al.
Published: (2025)
MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality
by: Yang, Panqi, et al.
Published: (2026)
by: Yang, Panqi, et al.
Published: (2026)
Revisiting MLLM Token Technology through the Lens of Classical Visual Coding
by: Liu, Jinming, et al.
Published: (2025)
by: Liu, Jinming, et al.
Published: (2025)
MGVQ: Synergizing Multi-dimensional Sensitivity-Aware and Gradient-Hessian Fusion for Vector Quantization
by: Wang, Zhong, et al.
Published: (2026)
by: Wang, Zhong, et al.
Published: (2026)
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
EntropyPrune: Matrix Entropy Guided Visual Token Pruning for Multimodal Large Language Models
by: Wang, Yahong, et al.
Published: (2026)
by: Wang, Yahong, et al.
Published: (2026)
Wave-Particle (Continuous-Discrete) Dualistic Visual Tokenization for Unified Understanding and Generation
by: Chen, Yizhu, et al.
Published: (2025)
by: Chen, Yizhu, et al.
Published: (2025)
SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3
by: Yang, Sicheng, et al.
Published: (2025)
by: Yang, Sicheng, et al.
Published: (2025)
CoordSpeaker: Exploiting Gesture Captioning for Coordinated Caption-Empowered Co-Speech Gesture Generation
by: Fang, Fengyi, et al.
Published: (2025)
by: Fang, Fengyi, et al.
Published: (2025)
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
by: Gu, Chenyang, et al.
Published: (2026)
by: Gu, Chenyang, et al.
Published: (2026)
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models
by: Zhu, Qihui, et al.
Published: (2026)
by: Zhu, Qihui, et al.
Published: (2026)
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
by: Zhang, Bowen, et al.
Published: (2025)
by: Zhang, Bowen, et al.
Published: (2025)
TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
by: Li, Duo, et al.
Published: (2025)
by: Li, Duo, et al.
Published: (2025)
Discrete JEPA: Learning Discrete Token Representations without Reconstruction
by: Baek, Junyeob, et al.
Published: (2025)
by: Baek, Junyeob, et al.
Published: (2025)
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
by: Meituan LongCat Team, et al.
Published: (2026)
by: Meituan LongCat Team, et al.
Published: (2026)
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
by: Li, Jungang, et al.
Published: (2024)
by: Li, Jungang, et al.
Published: (2024)
TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference
by: Hu, Junshan, et al.
Published: (2025)
by: Hu, Junshan, et al.
Published: (2025)
See the Text: From Tokenization to Visual Reading
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression
by: Zeng, Sen, et al.
Published: (2026)
by: Zeng, Sen, et al.
Published: (2026)
VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction
by: Hu, Yu, et al.
Published: (2025)
by: Hu, Yu, et al.
Published: (2025)
Scaling Video Pretraining for Surgical Foundation Models
by: Lu, Sicheng, et al.
Published: (2026)
by: Lu, Sicheng, et al.
Published: (2026)
Similar Items
-
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
by: Yang, Sicheng, et al.
Published: (2026) -
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
by: Wang, Bohan, et al.
Published: (2025) -
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
by: Li, Duo, et al.
Published: (2025) -
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
by: Jin, Yang, et al.
Published: (2023) -
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)