:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Ziliang, Huang, Xin, Guan, Quanlong, Lin, Liang, Luo, Weiqi
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2511.00191
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
by: Fu, Shuai, et al.
Published: (2024)

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
by: Li, Lin, et al.
Published: (2024)

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
by: Kim, Kwanyoung, et al.
Published: (2024)

RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models
by: Lin, Xiang, et al.
Published: (2025)

Differentiable Prompt Learning for Vision Language Models
by: Huang, Zhenhan, et al.
Published: (2024)

DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving
by: HU, Haibo, et al.
Published: (2025)

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models
by: Huang, Chengyue, et al.
Published: (2025)

VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture
by: Li, Long, et al.
Published: (2025)

VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
by: Luo, Run, et al.
Published: (2025)

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
by: Yang, Senqiao, et al.
Published: (2025)

Multi-Modal Adapter for Vision-Language Models
by: Seputis, Dominykas, et al.
Published: (2024)

Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
by: Zhao, Yunhan, et al.
Published: (2024)

Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique
by: Zhang, Zhehan, et al.
Published: (2026)

Retrospective Learning from Interactions
by: Chen, Zizhao, et al.
Published: (2024)

Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models
by: Lian, Chenyu, et al.
Published: (2025)

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
by: Li, Qixiu, et al.
Published: (2025)

Open-CRB: Towards Open World Active Learning for 3D Object Detection
by: Chen, Zhuoxiao, et al.
Published: (2023)

Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents
by: Zhang, Zhizhen, et al.
Published: (2025)

Composition Vision-Language Understanding via Segment and Depth Anything Model
by: Huo, Mingxiao, et al.
Published: (2024)

FastVLM: Efficient Vision Encoding for Vision Language Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)

To Trust Or Not To Trust Your Vision-Language Model's Prediction
by: Dong, Hao, et al.
Published: (2025)

PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation
by: Wahed, Muntasir, et al.
Published: (2024)

Prismer: A Vision-Language Model with Multi-Task Experts
by: Liu, Shikun, et al.
Published: (2023)

MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models
by: Fan, Xiaoran, et al.
Published: (2026)

CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection
by: Liu, Zhipeng, et al.
Published: (2026)

Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models
by: Chua, Jia Yun, et al.
Published: (2025)

Semantic Compositions Enhance Vision-Language Contrastive Learning
by: Aladago, Maxwell, et al.
Published: (2024)

Compositional Entailment Learning for Hyperbolic Vision-Language Models
by: Pal, Avik, et al.
Published: (2024)

Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024)

Parallel In-context Learning for Large Vision Language Models
by: Yamaguchi, Shin'ya, et al.
Published: (2026)

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets
by: Koran, Alex, et al.
Published: (2026)

Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
by: Huang, Po-Hsuan, et al.
Published: (2024)

Adapting Vision-Language Models Without Labels: A Comprehensive Survey
by: Dong, Hao, et al.
Published: (2025)

Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
by: Kim, Gahyeon, et al.
Published: (2025)

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
by: Kim, Gahyeon, et al.
Published: (2024)

On the Reproducibility of "FairCLIP: Harnessing Fairness in Vision-Language Learning''
by: Bakker, Hua Chang, et al.
Published: (2025)

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
by: Pach, Mateusz, et al.
Published: (2025)

Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
by: Yan, Hao, et al.
Published: (2024)

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
by: Wang, Zhengbo, et al.
Published: (2024)