Saved in:
| Main Authors: | Zeng, Sen, Zhou, Hong, Zhu, Zheng, Liu, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.04519 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
by: Yang, Sicheng, et al.
Published: (2026)
by: Yang, Sicheng, et al.
Published: (2026)
SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation
by: Lu, Yujie, et al.
Published: (2026)
by: Lu, Yujie, et al.
Published: (2026)
TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
by: Xia, Zunhui, et al.
Published: (2025)
by: Xia, Zunhui, et al.
Published: (2025)
SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation
by: Perera, Shehan, et al.
Published: (2024)
by: Perera, Shehan, et al.
Published: (2024)
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning
by: Hu, Xinrong, et al.
Published: (2024)
by: Hu, Xinrong, et al.
Published: (2024)
SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3
by: Yang, Sicheng, et al.
Published: (2025)
by: Yang, Sicheng, et al.
Published: (2025)
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
by: Xing, Zhaohu, et al.
Published: (2024)
by: Xing, Zhaohu, et al.
Published: (2024)
Medical Referring Image Segmentation via Next-Token Mask Prediction
by: Chen, Xinyu, et al.
Published: (2025)
by: Chen, Xinyu, et al.
Published: (2025)
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
by: Zhu, Jiaying, et al.
Published: (2025)
by: Zhu, Jiaying, et al.
Published: (2025)
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
by: Jin, Xin, et al.
Published: (2026)
by: Jin, Xin, et al.
Published: (2026)
Prompt-based Dynamic Token Pruning for Efficient Segmentation of Medical Images
by: Dutta, Pallabi, et al.
Published: (2025)
by: Dutta, Pallabi, et al.
Published: (2025)
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
by: Bai, Zechen, et al.
Published: (2024)
by: Bai, Zechen, et al.
Published: (2024)
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
SegResMamba: An Efficient Architecture for 3D Medical Image Segmentation
by: Das, Badhan Kumar, et al.
Published: (2025)
by: Das, Badhan Kumar, et al.
Published: (2025)
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
by: Zhu, Yuke, et al.
Published: (2024)
by: Zhu, Yuke, et al.
Published: (2024)
SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation
by: Tan, Shengbo, et al.
Published: (2024)
by: Tan, Shengbo, et al.
Published: (2024)
MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models
by: Liu, Shengyuan, et al.
Published: (2026)
by: Liu, Shengyuan, et al.
Published: (2026)
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
by: Wang, Yiyu, et al.
Published: (2025)
by: Wang, Yiyu, et al.
Published: (2025)
BiSegMamba: Efficient Bidirectional Tri-Oriented Mamba for 3D Medical Image Segmentation
by: Zada, Bakht, et al.
Published: (2026)
by: Zada, Bakht, et al.
Published: (2026)
TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation
by: Jiao, Yaxuan, et al.
Published: (2025)
by: Jiao, Yaxuan, et al.
Published: (2025)
Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
by: Zheng, Rongkun, et al.
Published: (2025)
by: Zheng, Rongkun, et al.
Published: (2025)
Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder
by: Jisheng, Dang, et al.
Published: (2025)
by: Jisheng, Dang, et al.
Published: (2025)
HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models
by: Zhang, Liheng, et al.
Published: (2025)
by: Zhang, Liheng, et al.
Published: (2025)
Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning
by: Ma, Yinchao, et al.
Published: (2026)
by: Ma, Yinchao, et al.
Published: (2026)
LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models
by: Liu, Juntao, et al.
Published: (2025)
by: Liu, Juntao, et al.
Published: (2025)
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation
by: Tang, Fenghe, et al.
Published: (2024)
by: Tang, Fenghe, et al.
Published: (2024)
Efficient Multi-modal Large Language Models via Visual Token Grouping
by: Huang, Minbin, et al.
Published: (2024)
by: Huang, Minbin, et al.
Published: (2024)
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
by: Wei, Cong, et al.
Published: (2024)
by: Wei, Cong, et al.
Published: (2024)
Aligning Text, Images, and 3D Structure Token-by-Token
by: Sahoo, Aadarsh, et al.
Published: (2025)
by: Sahoo, Aadarsh, et al.
Published: (2025)
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
by: li, Bonan, et al.
Published: (2025)
by: li, Bonan, et al.
Published: (2025)
An Efficient Token Compression Framework for Visual Object Tracking
by: Wu, Weijing, et al.
Published: (2026)
by: Wu, Weijing, et al.
Published: (2026)
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information
by: Wang, Ke, et al.
Published: (2024)
by: Wang, Ke, et al.
Published: (2024)
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
by: Yang, Chenyu, et al.
Published: (2024)
by: Yang, Chenyu, et al.
Published: (2024)
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
by: Wu, Peiran, et al.
Published: (2025)
by: Wu, Peiran, et al.
Published: (2025)
SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues
by: Xie, Yuxin, et al.
Published: (2024)
by: Xie, Yuxin, et al.
Published: (2024)
TokenPacker: Efficient Visual Projector for Multimodal LLM
by: Li, Wentong, et al.
Published: (2024)
by: Li, Wentong, et al.
Published: (2024)
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
by: Li, Shuai, et al.
Published: (2025)
by: Li, Shuai, et al.
Published: (2025)
Scaling Mesh Generation via Compressive Tokenization
by: Weng, Haohan, et al.
Published: (2024)
by: Weng, Haohan, et al.
Published: (2024)
TCFormer: Visual Recognition via Token Clustering Transformer
by: Zeng, Wang, et al.
Published: (2024)
by: Zeng, Wang, et al.
Published: (2024)
Similar Items
-
VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation
by: Yang, Sicheng, et al.
Published: (2026) -
SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation
by: Lu, Yujie, et al.
Published: (2026) -
TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
by: Xia, Zunhui, et al.
Published: (2025) -
SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation
by: Perera, Shehan, et al.
Published: (2024) -
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning
by: Hu, Xinrong, et al.
Published: (2024)