Saved in:
| Main Authors: | Chen, Ziliang, Huang, Xin, Guan, Quanlong, Lin, Liang, Luo, Weiqi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.00191 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
by: Fu, Shuai, et al.
Published: (2024)
by: Fu, Shuai, et al.
Published: (2024)
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
by: Li, Lin, et al.
Published: (2024)
by: Li, Lin, et al.
Published: (2024)
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
by: Kim, Kwanyoung, et al.
Published: (2024)
by: Kim, Kwanyoung, et al.
Published: (2024)
RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models
by: Lin, Xiang, et al.
Published: (2025)
by: Lin, Xiang, et al.
Published: (2025)
Differentiable Prompt Learning for Vision Language Models
by: Huang, Zhenhan, et al.
Published: (2024)
by: Huang, Zhenhan, et al.
Published: (2024)
DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving
by: HU, Haibo, et al.
Published: (2025)
by: HU, Haibo, et al.
Published: (2025)
Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models
by: Huang, Chengyue, et al.
Published: (2025)
by: Huang, Chengyue, et al.
Published: (2025)
VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture
by: Li, Long, et al.
Published: (2025)
by: Li, Long, et al.
Published: (2025)
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
by: Luo, Run, et al.
Published: (2025)
by: Luo, Run, et al.
Published: (2025)
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
by: Yang, Senqiao, et al.
Published: (2025)
by: Yang, Senqiao, et al.
Published: (2025)
Multi-Modal Adapter for Vision-Language Models
by: Seputis, Dominykas, et al.
Published: (2024)
by: Seputis, Dominykas, et al.
Published: (2024)
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
by: Zhao, Yunhan, et al.
Published: (2024)
by: Zhao, Yunhan, et al.
Published: (2024)
Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique
by: Zhang, Zhehan, et al.
Published: (2026)
by: Zhang, Zhehan, et al.
Published: (2026)
Retrospective Learning from Interactions
by: Chen, Zizhao, et al.
Published: (2024)
by: Chen, Zizhao, et al.
Published: (2024)
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models
by: Lian, Chenyu, et al.
Published: (2025)
by: Lian, Chenyu, et al.
Published: (2025)
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
by: Li, Qixiu, et al.
Published: (2025)
by: Li, Qixiu, et al.
Published: (2025)
Open-CRB: Towards Open World Active Learning for 3D Object Detection
by: Chen, Zhuoxiao, et al.
Published: (2023)
by: Chen, Zhuoxiao, et al.
Published: (2023)
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents
by: Zhang, Zhizhen, et al.
Published: (2025)
by: Zhang, Zhizhen, et al.
Published: (2025)
Composition Vision-Language Understanding via Segment and Depth Anything Model
by: Huo, Mingxiao, et al.
Published: (2024)
by: Huo, Mingxiao, et al.
Published: (2024)
FastVLM: Efficient Vision Encoding for Vision Language Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
To Trust Or Not To Trust Your Vision-Language Model's Prediction
by: Dong, Hao, et al.
Published: (2025)
by: Dong, Hao, et al.
Published: (2025)
PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation
by: Wahed, Muntasir, et al.
Published: (2024)
by: Wahed, Muntasir, et al.
Published: (2024)
Prismer: A Vision-Language Model with Multi-Task Experts
by: Liu, Shikun, et al.
Published: (2023)
by: Liu, Shikun, et al.
Published: (2023)
MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models
by: Fan, Xiaoran, et al.
Published: (2026)
by: Fan, Xiaoran, et al.
Published: (2026)
CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection
by: Liu, Zhipeng, et al.
Published: (2026)
by: Liu, Zhipeng, et al.
Published: (2026)
Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models
by: Chua, Jia Yun, et al.
Published: (2025)
by: Chua, Jia Yun, et al.
Published: (2025)
Semantic Compositions Enhance Vision-Language Contrastive Learning
by: Aladago, Maxwell, et al.
Published: (2024)
by: Aladago, Maxwell, et al.
Published: (2024)
Compositional Entailment Learning for Hyperbolic Vision-Language Models
by: Pal, Avik, et al.
Published: (2024)
by: Pal, Avik, et al.
Published: (2024)
Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024)
by: Ding, Tong, et al.
Published: (2024)
Parallel In-context Learning for Large Vision Language Models
by: Yamaguchi, Shin'ya, et al.
Published: (2026)
by: Yamaguchi, Shin'ya, et al.
Published: (2026)
Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets
by: Koran, Alex, et al.
Published: (2026)
by: Koran, Alex, et al.
Published: (2026)
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis
by: Huang, Po-Hsuan, et al.
Published: (2024)
by: Huang, Po-Hsuan, et al.
Published: (2024)
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
by: Dong, Hao, et al.
Published: (2025)
by: Dong, Hao, et al.
Published: (2025)
Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
by: Kim, Gahyeon, et al.
Published: (2025)
by: Kim, Gahyeon, et al.
Published: (2025)
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
by: Kim, Gahyeon, et al.
Published: (2024)
by: Kim, Gahyeon, et al.
Published: (2024)
On the Reproducibility of "FairCLIP: Harnessing Fairness in Vision-Language Learning''
by: Bakker, Hua Chang, et al.
Published: (2025)
by: Bakker, Hua Chang, et al.
Published: (2025)
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
by: Pach, Mateusz, et al.
Published: (2025)
by: Pach, Mateusz, et al.
Published: (2025)
Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
by: Yan, Hao, et al.
Published: (2024)
by: Yan, Hao, et al.
Published: (2024)
Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
by: Wang, Zhengbo, et al.
Published: (2024)
by: Wang, Zhengbo, et al.
Published: (2024)
Similar Items
-
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
by: Fu, Shuai, et al.
Published: (2024) -
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
by: Li, Lin, et al.
Published: (2024) -
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
by: Kim, Kwanyoung, et al.
Published: (2024) -
RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models
by: Lin, Xiang, et al.
Published: (2025) -
Differentiable Prompt Learning for Vision Language Models
by: Huang, Zhenhan, et al.
Published: (2024)