:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Jiawen, Bian, Shiran, Zhu, Yihang, Tan, Wenbin, Zhang, Yachao, Xie, Yuan, Qu, Yanyun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.20758
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PC-CrossDiff: Point-Cluster Dual-Level Cross-Modal Differential Attention for Unified 3D Referring and Segmentation
by: Tan, Wenbin, et al.
Published: (2026)

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024)

Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
by: Yuan, Shiran, et al.
Published: (2025)

EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
by: Lei, Qinqian, et al.
Published: (2024)

Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding
by: Tatematsu, Fumiya, et al.
Published: (2026)

Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation
by: Wu, Yao, et al.
Published: (2024)

HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
by: Lei, Qinqian, et al.
Published: (2025)

Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning
by: Habibpour, Mobin, et al.
Published: (2025)

MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)

Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
by: Shi, Jiangming, et al.
Published: (2024)

TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
by: Chen, Lihong, et al.
Published: (2025)

A Recipe for Improving Remote Sensing VLM Zero Shot Generalization
by: Barzilai, Aviad, et al.
Published: (2025)

Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation
by: Li, Jiahao, et al.
Published: (2025)

Learning Commonality, Divergence and Variety for Unsupervised Visible-Infrared Person Re-identification
by: Shi, Jiangming, et al.
Published: (2024)

Direct Segmentation without Logits Optimization for Training-Free Open-Vocabulary Semantic Segmentation
by: Li, Jiahao, et al.
Published: (2026)

EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models
by: Seo, Minjae, et al.
Published: (2025)

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
by: Wu, Chengyue, et al.
Published: (2026)

Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective
by: Li, Jiahao, et al.
Published: (2025)

VLM-Guided Experience Replay
by: Sharony, Elad, et al.
Published: (2026)

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
by: Cai, Kaitong, et al.
Published: (2025)

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
by: Zhang, Di, et al.
Published: (2024)

IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding
by: Li, Junxian, et al.
Published: (2025)

VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition
by: Zhang, Zaiwei, et al.
Published: (2024)

Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance
by: Qiu, Jason, et al.
Published: (2026)

Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification
by: Yin, Xiangbo, et al.
Published: (2024)

ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models
by: Sural, Shounak, et al.
Published: (2024)

VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization
by: Waheed, Sania, et al.
Published: (2025)

MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
by: Chen, Tao, et al.
Published: (2025)

GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System
by: James, MoniJesu, et al.
Published: (2026)

SwarmVLM: VLM-Guided Impedance Control for Autonomous Navigation of Heterogeneous Robots in Dynamic Warehousing
by: Zafar, Malaika, et al.
Published: (2025)

REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024)

CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation
by: Zuo, Zuo, et al.
Published: (2024)

Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement
by: Liu, Yang, et al.
Published: (2025)

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
by: Li, Xiaofan, et al.
Published: (2024)

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
by: Zhang, Yiming, et al.
Published: (2026)

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
by: Wang, Yuxin, et al.
Published: (2025)

EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography
by: Li, Yuheng, et al.
Published: (2025)

Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification
by: Zhang, Zhizhong, et al.
Published: (2024)

DocVLM: Make Your VLM an Efficient Reader
by: Nacson, Mor Shpigel, et al.
Published: (2024)

VLM-Vac: Enhancing Smart Vacuums through VLM Knowledge Distillation and Language-Guided Experience Replay
by: Mirjalili, Reihaneh, et al.
Published: (2024)