Saved in:
| Main Authors: | Chen, Wei, Li, Zhiyuan, Xin, Shuo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.11475 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass
by: Qian, Chen, et al.
Published: (2026)
by: Qian, Chen, et al.
Published: (2026)
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models
by: Yang, Morunliu, et al.
Published: (2026)
by: Yang, Morunliu, et al.
Published: (2026)
SpaceVLM: Sub-Space Modeling of Negation in Vision-Language Models
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)
OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning
by: Li, Geng, et al.
Published: (2026)
by: Li, Geng, et al.
Published: (2026)
Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
CS-VLM: Compressed Sensing Attention for Efficient Vision-Language Representation Learning
by: Kiruluta, Andrew, et al.
Published: (2025)
by: Kiruluta, Andrew, et al.
Published: (2025)
UniCompress: Token Compression for Unified Vision-Language Understanding and Generation
by: Wang, Ziyao, et al.
Published: (2026)
by: Wang, Ziyao, et al.
Published: (2026)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
by: Chu, Xiangxiang, et al.
Published: (2023)
by: Chu, Xiangxiang, et al.
Published: (2023)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
by: Tao, Keda, et al.
Published: (2025)
by: Tao, Keda, et al.
Published: (2025)
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
by: Hu, Lianyu, et al.
Published: (2025)
by: Hu, Lianyu, et al.
Published: (2025)
Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference
by: Bagrov, Natan, et al.
Published: (2025)
by: Bagrov, Natan, et al.
Published: (2025)
TRIO: Token Reduction via Inference-Objective Guidance for Efficient Vision-Language Models
by: Zhang, Haokui, et al.
Published: (2026)
by: Zhang, Haokui, et al.
Published: (2026)
Scaling Learned Image Compression Models up to 1 Billion
by: Li, Yuqi, et al.
Published: (2025)
by: Li, Yuqi, et al.
Published: (2025)
Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
by: Cao, Sihan, et al.
Published: (2026)
by: Cao, Sihan, et al.
Published: (2026)
TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference
by: Hu, Junshan, et al.
Published: (2025)
by: Hu, Junshan, et al.
Published: (2025)
VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm
by: Wu, Zhenkai, et al.
Published: (2025)
by: Wu, Zhenkai, et al.
Published: (2025)
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
by: Xiong, Tianwei, et al.
Published: (2025)
by: Xiong, Tianwei, et al.
Published: (2025)
IIR-VLM: In-Context Instance-level Recognition for Large Vision-Language Models
by: Shi, Liang, et al.
Published: (2026)
by: Shi, Liang, et al.
Published: (2026)
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models
by: Huang, Mingzhe, et al.
Published: (2026)
by: Huang, Mingzhe, et al.
Published: (2026)
Scaling Pre-training to One Hundred Billion Data for Vision Language Models
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
by: Zhu, Jiaying, et al.
Published: (2025)
by: Zhu, Jiaying, et al.
Published: (2025)
Scaling Diffusion Transformers to 16 Billion Parameters
by: Fei, Zhengcong, et al.
Published: (2024)
by: Fei, Zhengcong, et al.
Published: (2024)
FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
by: Liu, Yuanqing, et al.
Published: (2026)
by: Liu, Yuanqing, et al.
Published: (2026)
EvoCut: Multi-Layer Evolution-Aware Visual Token Compression for Efficient Large Vision-Language Models
by: Lu, Hongyu, et al.
Published: (2026)
by: Lu, Hongyu, et al.
Published: (2026)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
FastVLM: Efficient Vision Encoding for Vision Language Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
by: Wang, Mengyue, et al.
Published: (2025)
by: Wang, Mengyue, et al.
Published: (2025)
A Survey of Token Compression for Efficient Multimodal Large Language Models
by: Shao, Kele, et al.
Published: (2025)
by: Shao, Kele, et al.
Published: (2025)
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
by: Shao, Rui, et al.
Published: (2025)
by: Shao, Rui, et al.
Published: (2025)
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
by: Li, Qingyun, et al.
Published: (2024)
by: Li, Qingyun, et al.
Published: (2024)
TrojVLM: Backdoor Attack Against Vision Language Models
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence
by: She, Chaoyin, et al.
Published: (2025)
by: She, Chaoyin, et al.
Published: (2025)
Similar Items
-
SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass
by: Qian, Chen, et al.
Published: (2026) -
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
by: Zhang, Yuan, et al.
Published: (2024) -
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
by: Zhang, Jusheng, et al.
Published: (2025) -
OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models
by: Yang, Morunliu, et al.
Published: (2026) -
SpaceVLM: Sub-Space Modeling of Negation in Vision-Language Models
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)