Saved in:
| Main Authors: | Zhu, Hao, Jin, Shuo, Liao, Wenbin, Xiao, Jiayu, Zhu, Yan, Yu, Siyue, Dai, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.12325 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
by: Zhu, Hao, et al.
Published: (2024)
by: Zhu, Hao, et al.
Published: (2024)
TALENT: Target-aware Efficient Tuning for Referring Image Segmentation
by: Jin, Shuo, et al.
Published: (2026)
by: Jin, Shuo, et al.
Published: (2026)
VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models
by: Meftah, Hanene F. Z. Brachemi, et al.
Published: (2025)
by: Meftah, Hanene F. Z. Brachemi, et al.
Published: (2025)
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
by: Wang, Feng, et al.
Published: (2023)
by: Wang, Feng, et al.
Published: (2023)
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
by: Wang, Boyang, et al.
Published: (2026)
by: Wang, Boyang, et al.
Published: (2026)
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
by: Yang, Jinze, et al.
Published: (2024)
by: Yang, Jinze, et al.
Published: (2024)
Human-Free Automated Prompting for Vision-Language Anomaly Detection: Prompt Optimization with Meta-guiding Prompt Scheme
by: Chen, Pi-Wei, et al.
Published: (2024)
by: Chen, Pi-Wei, et al.
Published: (2024)
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
by: Zhu, Xingyu, et al.
Published: (2024)
by: Zhu, Xingyu, et al.
Published: (2024)
Quantized Prompt for Efficient Generalization of Vision-Language Models
by: Hao, Tianxiang, et al.
Published: (2024)
by: Hao, Tianxiang, et al.
Published: (2024)
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
by: Lan, Mengcheng, et al.
Published: (2024)
by: Lan, Mengcheng, et al.
Published: (2024)
TRIO: Token Reduction via Inference-Objective Guidance for Efficient Vision-Language Models
by: Zhang, Haokui, et al.
Published: (2026)
by: Zhang, Haokui, et al.
Published: (2026)
TF-SSD: A Strong Pipeline via Synergic Mask Filter for Training-free Co-salient Object Detection
by: He, Zhijin, et al.
Published: (2026)
by: He, Zhijin, et al.
Published: (2026)
Text Prompt Injection of Vision Language Models
by: Zhu, Ruizhe
Published: (2025)
by: Zhu, Ruizhe
Published: (2025)
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
by: Zheng, Henry, et al.
Published: (2025)
by: Zheng, Henry, et al.
Published: (2025)
Efficient Test-Time Prompt Tuning for Vision-Language Models
by: Zhu, Yuhan, et al.
Published: (2024)
by: Zhu, Yuhan, et al.
Published: (2024)
Consistency-guided Prompt Learning for Vision-Language Models
by: Roy, Shuvendu, et al.
Published: (2023)
by: Roy, Shuvendu, et al.
Published: (2023)
EvoCut: Multi-Layer Evolution-Aware Visual Token Compression for Efficient Large Vision-Language Models
by: Lu, Hongyu, et al.
Published: (2026)
by: Lu, Hongyu, et al.
Published: (2026)
Selective Vision-Language Subspace Projection for Few-shot CLIP
by: Zhu, Xingyu, et al.
Published: (2024)
by: Zhu, Xingyu, et al.
Published: (2024)
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
by: Zhou, Yuchen, et al.
Published: (2025)
by: Zhou, Yuchen, et al.
Published: (2025)
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
by: Yu, Jun, et al.
Published: (2024)
by: Yu, Jun, et al.
Published: (2024)
FCoT-VL:Advancing Text-oriented Large Vision-Language Models with Efficient Visual Token Compression
by: Li, Jianjian, et al.
Published: (2025)
by: Li, Jianjian, et al.
Published: (2025)
Causality-guided Prompt Learning for Vision-language Models via Visual Granulation
by: Gao, Mengyu, et al.
Published: (2025)
by: Gao, Mengyu, et al.
Published: (2025)
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
by: Li, Zheng, et al.
Published: (2024)
by: Li, Zheng, et al.
Published: (2024)
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
Revisiting Prompt Pretraining of Vision-Language Models
by: Chen, Zhenyuan, et al.
Published: (2024)
by: Chen, Zhenyuan, et al.
Published: (2024)
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
by: Duan, Yuchen, et al.
Published: (2024)
by: Duan, Yuchen, et al.
Published: (2024)
Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models
by: Li, Jiayu, et al.
Published: (2026)
by: Li, Jiayu, et al.
Published: (2026)
Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations
by: Zhu, Kangyu, et al.
Published: (2025)
by: Zhu, Kangyu, et al.
Published: (2025)
Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference
by: Cahyani, Putu Indah Githa, et al.
Published: (2025)
by: Cahyani, Putu Indah Githa, et al.
Published: (2025)
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
Q-Tacit: Image Quality Assessment via Latent Visual Reasoning
by: Jiang, Yuxuan, et al.
Published: (2026)
by: Jiang, Yuxuan, et al.
Published: (2026)
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
by: Zhu, Lianghui, et al.
Published: (2024)
by: Zhu, Lianghui, et al.
Published: (2024)
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
by: Kim, Jisoo, et al.
Published: (2025)
by: Kim, Jisoo, et al.
Published: (2025)
Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
by: Wang, Kejie, et al.
Published: (2024)
by: Wang, Kejie, et al.
Published: (2024)
VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs
by: Zhu, Jiaying, et al.
Published: (2025)
by: Zhu, Jiaying, et al.
Published: (2025)
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
by: Ge, Junqi, et al.
Published: (2024)
by: Ge, Junqi, et al.
Published: (2024)
Visual Prompt Engineering for Vision Language Models in Radiology
by: Denner, Stefan, et al.
Published: (2024)
by: Denner, Stefan, et al.
Published: (2024)
Visual Prompt-Agnostic Evolution
by: Wang, Junze, et al.
Published: (2026)
by: Wang, Junze, et al.
Published: (2026)
Low-rank Prompt Interaction for Continual Vision-Language Retrieval
by: Yan, Weicai, et al.
Published: (2025)
by: Yan, Weicai, et al.
Published: (2025)
Similar Items
-
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
by: Zhu, Hao, et al.
Published: (2024) -
TALENT: Target-aware Efficient Tuning for Referring Image Segmentation
by: Jin, Shuo, et al.
Published: (2026) -
VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models
by: Meftah, Hanene F. Z. Brachemi, et al.
Published: (2025) -
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
by: Wang, Feng, et al.
Published: (2023) -
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
by: Wang, Boyang, et al.
Published: (2026)