:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Longrong, Shen, Dong, Cai, Chaoxiang, Yang, Fan, Gao, Tingting, Zhang, Di, Li, Xi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.19905
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model
by: Cai, Chaoxiang, et al.
Published: (2025)

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization
by: Jia, Chenwei, et al.
Published: (2026)

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
by: Lin, Bin, et al.
Published: (2024)

Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
by: Cao, Sihan, et al.
Published: (2026)

MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
by: Wang, Dianyi, et al.
Published: (2025)

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
by: Yu, Jiazuo, et al.
Published: (2024)

Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts
by: Chen, Xi, et al.
Published: (2025)

SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts
by: Liu, Jiaqi, et al.
Published: (2025)

Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts
by: He, Xin, et al.
Published: (2025)

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation
by: Dong, Shaoqi, et al.
Published: (2025)

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
by: Shen, Leyang, et al.
Published: (2024)

Resolving Task Objective Conflicts in Unified Model via Task-Aware Mixture-of-Experts
by: Zhang, Jiaxing, et al.
Published: (2025)

DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models
by: Qin, Mengxin, et al.
Published: (2026)

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
by: Wen, Zichen, et al.
Published: (2025)

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
by: Zhang, Jusheng, et al.
Published: (2025)

EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
by: Feng, Ze, et al.
Published: (2025)

Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts
by: Chen, Xi, et al.
Published: (2026)

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection
by: Song, Fenghao, et al.
Published: (2026)

Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models
by: Wang, Peiran, et al.
Published: (2025)

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)

LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition
by: Zhou, Rixin, et al.
Published: (2025)

EVLM: An Efficient Vision-Language Model for Visual Understanding
by: Chen, Kaibing, et al.
Published: (2024)

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
by: Chen, Qizhou, et al.
Published: (2024)

Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
by: Kim, Sohee, et al.
Published: (2025)

MoVA: Adapting Mixture of Vision Experts to Multimodal Context
by: Zong, Zhuofan, et al.
Published: (2024)

TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models
by: Wang, Xin, et al.
Published: (2026)

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
by: Liang, Xiao, et al.
Published: (2025)

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024)

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
by: Cai, Kaitong, et al.
Published: (2025)

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence
by: She, Chaoyin, et al.
Published: (2025)

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
by: Yang, Chenyu, et al.
Published: (2024)

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)

SEMC: Structure-Enhanced Mixture-of-Experts Contrastive Learning for Ultrasound Standard Plane Recognition
by: Cai, Qing, et al.
Published: (2025)

Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models
by: He, Landi, et al.
Published: (2026)

LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
by: Wang, Yimu, et al.
Published: (2025)

Towards Vision Mixture of Experts for Wildlife Monitoring on the Edge
by: Mensah, Emmanuel Azuh, et al.
Published: (2024)

Rethinking Efficient Mixture-of-Experts for Remote Sensing Modality-Missing Classification
by: Gao, Qinghao, et al.
Published: (2025)

A Survey of Token Compression for Efficient Multimodal Large Language Models
by: Shao, Kele, et al.
Published: (2025)

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
by: Shang, Yuying, et al.
Published: (2024)