Saved in:
| Main Authors: | Chen, Qinghui, Zhang, Zekai, Zhang, Zaigui, Zhang, Kai, Li, Dagang, Wang, Wenmin, Zhang, Jinglin, Liu, Cong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.26735 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamic Eraser for Guided Concept Erasure in Diffusion Models
by: Gong, Qinghui
Published: (2026)
by: Gong, Qinghui
Published: (2026)
Sparse Shortcuts: Facilitating Efficient Fusion in Multimodal Large Language Models
by: Zhang, Jingrui, et al.
Published: (2026)
by: Zhang, Jingrui, et al.
Published: (2026)
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
by: Zhang, Wenqiao, et al.
Published: (2024)
by: Zhang, Wenqiao, et al.
Published: (2024)
Towards Principled Dataset Distillation: A Spectral Distribution Perspective
by: Wu, Ruixi, et al.
Published: (2026)
by: Wu, Ruixi, et al.
Published: (2026)
Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
by: Zhou, Zijie, et al.
Published: (2026)
by: Zhou, Zijie, et al.
Published: (2026)
Physical Prompt Injection Attacks on Large Vision-Language Models
by: Ling, Chen, et al.
Published: (2026)
by: Ling, Chen, et al.
Published: (2026)
Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization
by: Jia, Chenwei, et al.
Published: (2026)
by: Jia, Chenwei, et al.
Published: (2026)
Your Vision-Language Model Can't Even Count to 20: Exposing the Failures of VLMs in Compositional Counting
by: Guo, Xuyang, et al.
Published: (2025)
by: Guo, Xuyang, et al.
Published: (2025)
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
by: Zhang, Shu-Hao, et al.
Published: (2025)
by: Zhang, Shu-Hao, et al.
Published: (2025)
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
by: Yin, Jianghao, et al.
Published: (2026)
by: Yin, Jianghao, et al.
Published: (2026)
ResDiff: Combining CNN and Diffusion Model for Image Super-Resolution
by: Shang, Shuyao, et al.
Published: (2023)
by: Shang, Shuyao, et al.
Published: (2023)
PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration
by: Huang, Xiaoshui, et al.
Published: (2025)
by: Huang, Xiaoshui, et al.
Published: (2025)
\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation
by: Zhu, Weiye, et al.
Published: (2026)
by: Zhu, Weiye, et al.
Published: (2026)
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
by: Zhang, Rongyu, et al.
Published: (2024)
by: Zhang, Rongyu, et al.
Published: (2024)
Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
by: Yu, Xiaomin, et al.
Published: (2026)
by: Yu, Xiaomin, et al.
Published: (2026)
VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification
by: Liu, Jianmeng, et al.
Published: (2024)
by: Liu, Jianmeng, et al.
Published: (2024)
Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism
by: Chen, Tao, et al.
Published: (2026)
by: Chen, Tao, et al.
Published: (2026)
MammothModa: Multi-Modal Large Language Model
by: She, Qi, et al.
Published: (2024)
by: She, Qi, et al.
Published: (2024)
Geodesics with Unified Tangent-constrained Priors and Curvature Regularization
by: Di, Chong, et al.
Published: (2026)
by: Di, Chong, et al.
Published: (2026)
GTMA: Dynamic Representation Optimization for OOD Vision-Language Models
by: Zhang, Jensen, et al.
Published: (2025)
by: Zhang, Jensen, et al.
Published: (2025)
Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
by: Chen, Junkai, et al.
Published: (2026)
by: Chen, Junkai, et al.
Published: (2026)
Semantic Communication based on Large Language Model for Underwater Image Transmission
by: Chen, Weilong, et al.
Published: (2024)
by: Chen, Weilong, et al.
Published: (2024)
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
by: Zhang, Yuanhong, et al.
Published: (2026)
by: Zhang, Yuanhong, et al.
Published: (2026)
DRScaffold: Boosting Dense-Scene Reasoning in Lightweight Vision Language Models
by: Shi, Xinrui, et al.
Published: (2026)
by: Shi, Xinrui, et al.
Published: (2026)
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
by: Tao, Hongyuan, et al.
Published: (2025)
by: Tao, Hongyuan, et al.
Published: (2025)
Dynamic Exploration on Segment-Proposal Graphs for Tubular Centerline Tracking
by: Di, Chong, et al.
Published: (2025)
by: Di, Chong, et al.
Published: (2025)
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
by: Zhang, Naifu, et al.
Published: (2025)
by: Zhang, Naifu, et al.
Published: (2025)
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
by: Zhang, Jinhao, et al.
Published: (2025)
by: Zhang, Jinhao, et al.
Published: (2025)
Accelerating Diffusion Models with One-to-Many Knowledge Distillation
by: Zhang, Linfeng, et al.
Published: (2024)
by: Zhang, Linfeng, et al.
Published: (2024)
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
by: Wang, Kai, et al.
Published: (2024)
by: Wang, Kai, et al.
Published: (2024)
Large Language Model-Driven Distributed Integrated Multimodal Sensing and Semantic Communications
by: Peng, Yubo, et al.
Published: (2025)
by: Peng, Yubo, et al.
Published: (2025)
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
by: Dai, Muzhi, et al.
Published: (2025)
by: Dai, Muzhi, et al.
Published: (2025)
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
by: Zhang, Wenchuan, et al.
Published: (2025)
by: Zhang, Wenchuan, et al.
Published: (2025)
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning
by: Shao, Maanping, et al.
Published: (2026)
by: Shao, Maanping, et al.
Published: (2026)
Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation
by: Shi, Jin, et al.
Published: (2026)
by: Shi, Jin, et al.
Published: (2026)
Advancing High Resolution Vision-Language Models in Biomedicine
by: Chen, Zekai, et al.
Published: (2024)
by: Chen, Zekai, et al.
Published: (2024)
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
Similar Items
-
Dynamic Eraser for Guided Concept Erasure in Diffusion Models
by: Gong, Qinghui
Published: (2026) -
Sparse Shortcuts: Facilitating Efficient Fusion in Multimodal Large Language Models
by: Zhang, Jingrui, et al.
Published: (2026) -
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
by: Zhang, Wenqiao, et al.
Published: (2024) -
Towards Principled Dataset Distillation: A Spectral Distribution Perspective
by: Wu, Ruixi, et al.
Published: (2026) -
Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
by: Zhou, Zijie, et al.
Published: (2026)