:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Qinghui, Zhang, Zekai, Zhang, Zaigui, Zhang, Kai, Li, Dagang, Wang, Wenmin, Zhang, Jinglin, Liu, Cong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.26735
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dynamic Eraser for Guided Concept Erasure in Diffusion Models
by: Gong, Qinghui
Published: (2026)

Sparse Shortcuts: Facilitating Efficient Fusion in Multimodal Large Language Models
by: Zhang, Jingrui, et al.
Published: (2026)

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
by: Zhang, Wenqiao, et al.
Published: (2024)

Towards Principled Dataset Distillation: A Spectral Distribution Perspective
by: Wu, Ruixi, et al.
Published: (2026)

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
by: Zhou, Zijie, et al.
Published: (2026)

Physical Prompt Injection Attacks on Large Vision-Language Models
by: Ling, Chen, et al.
Published: (2026)

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization
by: Jia, Chenwei, et al.
Published: (2026)

Your Vision-Language Model Can't Even Count to 20: Exposing the Failures of VLMs in Compositional Counting
by: Guo, Xuyang, et al.
Published: (2025)

TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
by: Zhang, Shu-Hao, et al.
Published: (2025)

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
by: Yin, Jianghao, et al.
Published: (2026)

ResDiff: Combining CNN and Diffusion Model for Image Super-Resolution
by: Shang, Shuyao, et al.
Published: (2023)

PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration
by: Huang, Xiaoshui, et al.
Published: (2025)

\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation
by: Zhu, Weiye, et al.
Published: (2026)

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
by: Zhang, Rongyu, et al.
Published: (2024)

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
by: Yu, Xiaomin, et al.
Published: (2026)

VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification
by: Liu, Jianmeng, et al.
Published: (2024)

Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism
by: Chen, Tao, et al.
Published: (2026)

MammothModa: Multi-Modal Large Language Model
by: She, Qi, et al.
Published: (2024)

Geodesics with Unified Tangent-constrained Priors and Curvature Regularization
by: Di, Chong, et al.
Published: (2026)

GTMA: Dynamic Representation Optimization for OOD Vision-Language Models
by: Zhang, Jensen, et al.
Published: (2025)

Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
by: Chen, Junkai, et al.
Published: (2026)

Semantic Communication based on Large Language Model for Underwater Image Transmission
by: Chen, Weilong, et al.
Published: (2024)

Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
by: Zhang, Yuanhong, et al.
Published: (2026)

DRScaffold: Boosting Dense-Scene Reasoning in Lightweight Vision Language Models
by: Shi, Xinrui, et al.
Published: (2026)

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
by: Tao, Hongyuan, et al.
Published: (2025)

Dynamic Exploration on Segment-Proposal Graphs for Tubular Centerline Tracking
by: Di, Chong, et al.
Published: (2025)

Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
by: Zhang, Naifu, et al.
Published: (2025)

MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
by: Zhang, Jinhao, et al.
Published: (2025)

Accelerating Diffusion Models with One-to-Many Knowledge Distillation
by: Zhang, Linfeng, et al.
Published: (2024)

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
by: Wang, Kai, et al.
Published: (2024)

Large Language Model-Driven Distributed Integrated Multimodal Sensing and Semantic Communications
by: Peng, Yubo, et al.
Published: (2025)

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
by: Dai, Muzhi, et al.
Published: (2025)

Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
by: Zhang, Wenchuan, et al.
Published: (2025)

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
by: Yang, Rui, et al.
Published: (2025)

X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning
by: Shao, Maanping, et al.
Published: (2026)

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation
by: Shi, Jin, et al.
Published: (2026)

Advancing High Resolution Vision-Language Models in Biomedicine
by: Chen, Zekai, et al.
Published: (2024)

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
by: Zhang, Zicheng, et al.
Published: (2024)