Saved in:
| Main Authors: | Sun, Yuhao, Cai, Chengyi, Zhang, Jiacheng, Ye, Zesheng, Yuan, Xingliang, Liu, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20419 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Attribute-based Visual Reprogramming for Vision-Language Models
by: Cai, Chengyi, et al.
Published: (2025)
by: Cai, Chengyi, et al.
Published: (2025)
Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts
by: Cai, Chengyi, et al.
Published: (2025)
by: Cai, Chengyi, et al.
Published: (2025)
Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning
by: Cai, Chengyi, et al.
Published: (2026)
by: Cai, Chengyi, et al.
Published: (2026)
Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification
by: Sun, Yuhao, et al.
Published: (2025)
by: Sun, Yuhao, et al.
Published: (2025)
Bayesian-guided Label Mapping for Visual Reprogramming
by: Cai, Chengyi, et al.
Published: (2024)
by: Cai, Chengyi, et al.
Published: (2024)
Sample-specific Masks for Visual Reprogramming-based Prompting
by: Cai, Chengyi, et al.
Published: (2024)
by: Cai, Chengyi, et al.
Published: (2024)
Bi-MCQ: Reformulating Vision-Language Alignment for Negation Understanding
by: Kim, Tae Hun, et al.
Published: (2026)
by: Kim, Tae Hun, et al.
Published: (2026)
BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models
by: Hu, Xuefeng, et al.
Published: (2024)
by: Hu, Xuefeng, et al.
Published: (2024)
Kernelized Sparse Fine-Tuning with Bi-level Parameter Competition for Vision Models
by: Shen, Shufan, et al.
Published: (2025)
by: Shen, Shufan, et al.
Published: (2025)
Large Vision Model-Guided Masked Low-Rank Approximation for Ground-Roll Attenuation
by: Liao, Jiacheng, et al.
Published: (2026)
by: Liao, Jiacheng, et al.
Published: (2026)
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval
by: Ma, Yiwei, et al.
Published: (2024)
by: Ma, Yiwei, et al.
Published: (2024)
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
by: Xie, Chunyu, et al.
Published: (2025)
by: Xie, Chunyu, et al.
Published: (2025)
Infusing fine-grained visual knowledge to Vision-Language Models
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2025)
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2025)
How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking
by: Li, Xuchen, et al.
Published: (2024)
by: Li, Xuchen, et al.
Published: (2024)
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
by: Miranda, Imanol, et al.
Published: (2024)
by: Miranda, Imanol, et al.
Published: (2024)
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)
by: Xu, Guowei, et al.
Published: (2024)
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)
by: Chen, Honghao, et al.
Published: (2025)
BiGain: Unified Token Compression for Joint Generation and Classification
by: Liu, Jiacheng, et al.
Published: (2026)
by: Liu, Jiacheng, et al.
Published: (2026)
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
by: Guan, Kaisi, et al.
Published: (2025)
by: Guan, Kaisi, et al.
Published: (2025)
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
by: Ishmam, Alvi Md, et al.
Published: (2024)
by: Ishmam, Alvi Md, et al.
Published: (2024)
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
by: Ying, Zonghao, et al.
Published: (2024)
by: Ying, Zonghao, et al.
Published: (2024)
Vision-Language Feature Alignment for Road Anomaly Segmentation
by: He, Zhuolin, et al.
Published: (2026)
by: He, Zhuolin, et al.
Published: (2026)
MAIL++: Multi-Modal Bi-directional Agent Layer for Vision-Language Models
by: Chen, Kaixiang, et al.
Published: (2026)
by: Chen, Kaixiang, et al.
Published: (2026)
BiPVL-Seg: Bidirectional Progressive Vision-Language Fusion with Global-Local Alignment for Medical Image Segmentation
by: Sultan, Rafi Ibn, et al.
Published: (2025)
by: Sultan, Rafi Ibn, et al.
Published: (2025)
GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations
by: Li, Zesheng, et al.
Published: (2026)
by: Li, Zesheng, et al.
Published: (2026)
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention
by: Long, Nguyen Huu Bao, et al.
Published: (2024)
by: Long, Nguyen Huu Bao, et al.
Published: (2024)
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
by: Shi, Yulong, et al.
Published: (2023)
by: Shi, Yulong, et al.
Published: (2023)
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?
by: Liu, Qing'an, et al.
Published: (2026)
by: Liu, Qing'an, et al.
Published: (2026)
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
by: Luo, Chuwei, et al.
Published: (2022)
by: Luo, Chuwei, et al.
Published: (2022)
Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
by: Sun, Yuhao, et al.
Published: (2025)
by: Sun, Yuhao, et al.
Published: (2025)
3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale
by: Fan, Yijia, et al.
Published: (2025)
by: Fan, Yijia, et al.
Published: (2025)
KETA: Kinematic-Phrases-Enhanced Text-to-Motion Generation via Fine-grained Alignment
by: Jiang, Yu, et al.
Published: (2025)
by: Jiang, Yu, et al.
Published: (2025)
Cross-modal Full-mode Fine-grained Alignment for Text-to-Image Person Retrieval
by: Yin, Hao, et al.
Published: (2025)
by: Yin, Hao, et al.
Published: (2025)
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
by: Wang, Zhecan, et al.
Published: (2023)
by: Wang, Zhecan, et al.
Published: (2023)
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
by: Jing, Liqiang, et al.
Published: (2023)
by: Jing, Liqiang, et al.
Published: (2023)
Adapting Vision-Language Model with Fine-grained Semantics for Open-Vocabulary Segmentation
by: Chng, Yong Xien, et al.
Published: (2024)
by: Chng, Yong Xien, et al.
Published: (2024)
Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
by: Jia, Kaidi, et al.
Published: (2026)
by: Jia, Kaidi, et al.
Published: (2026)
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations
by: Cui, Yibo, et al.
Published: (2025)
by: Cui, Yibo, et al.
Published: (2025)
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
Similar Items
-
Attribute-based Visual Reprogramming for Vision-Language Models
by: Cai, Chengyi, et al.
Published: (2025) -
Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts
by: Cai, Chengyi, et al.
Published: (2025) -
Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning
by: Cai, Chengyi, et al.
Published: (2026) -
Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification
by: Sun, Yuhao, et al.
Published: (2025) -
Bayesian-guided Label Mapping for Visual Reprogramming
by: Cai, Chengyi, et al.
Published: (2024)