Saved in:
| Main Authors: | Lin, Jiawen, Bian, Shiran, Zhu, Yihang, Tan, Wenbin, Zhang, Yachao, Xie, Yuan, Qu, Yanyun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.20758 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PC-CrossDiff: Point-Cluster Dual-Level Cross-Modal Differential Attention for Unified 3D Referring and Segmentation
by: Tan, Wenbin, et al.
Published: (2026)
by: Tan, Wenbin, et al.
Published: (2026)
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024)
by: Xu, Runsen, et al.
Published: (2024)
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
by: Yuan, Shiran, et al.
Published: (2025)
by: Yuan, Shiran, et al.
Published: (2025)
EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
by: Lei, Qinqian, et al.
Published: (2024)
by: Lei, Qinqian, et al.
Published: (2024)
Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding
by: Tatematsu, Fumiya, et al.
Published: (2026)
by: Tatematsu, Fumiya, et al.
Published: (2026)
Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation
by: Wu, Yao, et al.
Published: (2024)
by: Wu, Yao, et al.
Published: (2024)
HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
by: Lei, Qinqian, et al.
Published: (2025)
by: Lei, Qinqian, et al.
Published: (2025)
Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning
by: Habibpour, Mobin, et al.
Published: (2025)
by: Habibpour, Mobin, et al.
Published: (2025)
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)
by: Tu, Rong-Cheng, et al.
Published: (2025)
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
by: Shi, Jiangming, et al.
Published: (2024)
by: Shi, Jiangming, et al.
Published: (2024)
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
by: Chen, Lihong, et al.
Published: (2025)
by: Chen, Lihong, et al.
Published: (2025)
A Recipe for Improving Remote Sensing VLM Zero Shot Generalization
by: Barzilai, Aviad, et al.
Published: (2025)
by: Barzilai, Aviad, et al.
Published: (2025)
Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
Learning Commonality, Divergence and Variety for Unsupervised Visible-Infrared Person Re-identification
by: Shi, Jiangming, et al.
Published: (2024)
by: Shi, Jiangming, et al.
Published: (2024)
Direct Segmentation without Logits Optimization for Training-Free Open-Vocabulary Semantic Segmentation
by: Li, Jiahao, et al.
Published: (2026)
by: Li, Jiahao, et al.
Published: (2026)
EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models
by: Seo, Minjae, et al.
Published: (2025)
by: Seo, Minjae, et al.
Published: (2025)
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
by: Wu, Chengyue, et al.
Published: (2026)
by: Wu, Chengyue, et al.
Published: (2026)
Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
VLM-Guided Experience Replay
by: Sharony, Elad, et al.
Published: (2026)
by: Sharony, Elad, et al.
Published: (2026)
FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
by: Cai, Kaitong, et al.
Published: (2025)
by: Cai, Kaitong, et al.
Published: (2025)
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
by: Zhang, Di, et al.
Published: (2024)
by: Zhang, Di, et al.
Published: (2024)
IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding
by: Li, Junxian, et al.
Published: (2025)
by: Li, Junxian, et al.
Published: (2025)
VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition
by: Zhang, Zaiwei, et al.
Published: (2024)
by: Zhang, Zaiwei, et al.
Published: (2024)
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance
by: Qiu, Jason, et al.
Published: (2026)
by: Qiu, Jason, et al.
Published: (2026)
Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification
by: Yin, Xiangbo, et al.
Published: (2024)
by: Yin, Xiangbo, et al.
Published: (2024)
ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models
by: Sural, Shounak, et al.
Published: (2024)
by: Sural, Shounak, et al.
Published: (2024)
VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization
by: Waheed, Sania, et al.
Published: (2025)
by: Waheed, Sania, et al.
Published: (2025)
MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
by: Chen, Tao, et al.
Published: (2025)
by: Chen, Tao, et al.
Published: (2025)
GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System
by: James, MoniJesu, et al.
Published: (2026)
by: James, MoniJesu, et al.
Published: (2026)
SwarmVLM: VLM-Guided Impedance Control for Autonomous Navigation of Heterogeneous Robots in Dynamic Warehousing
by: Zafar, Malaika, et al.
Published: (2025)
by: Zafar, Malaika, et al.
Published: (2025)
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024)
by: Xue, Xizhe, et al.
Published: (2024)
CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation
by: Zuo, Zuo, et al.
Published: (2024)
by: Zuo, Zuo, et al.
Published: (2024)
Bootstrapping Physics-Grounded Video Generation through VLM-Guided Iterative Self-Refinement
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
by: Li, Xiaofan, et al.
Published: (2024)
by: Li, Xiaofan, et al.
Published: (2024)
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
by: Zhang, Yiming, et al.
Published: (2026)
by: Zhang, Yiming, et al.
Published: (2026)
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
by: Wang, Yuxin, et al.
Published: (2025)
by: Wang, Yuxin, et al.
Published: (2025)
EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography
by: Li, Yuheng, et al.
Published: (2025)
by: Li, Yuheng, et al.
Published: (2025)
Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification
by: Zhang, Zhizhong, et al.
Published: (2024)
by: Zhang, Zhizhong, et al.
Published: (2024)
DocVLM: Make Your VLM an Efficient Reader
by: Nacson, Mor Shpigel, et al.
Published: (2024)
by: Nacson, Mor Shpigel, et al.
Published: (2024)
VLM-Vac: Enhancing Smart Vacuums through VLM Knowledge Distillation and Language-Guided Experience Replay
by: Mirjalili, Reihaneh, et al.
Published: (2024)
by: Mirjalili, Reihaneh, et al.
Published: (2024)
Similar Items
-
PC-CrossDiff: Point-Cluster Dual-Level Cross-Modal Differential Attention for Unified 3D Referring and Segmentation
by: Tan, Wenbin, et al.
Published: (2026) -
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024) -
Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers
by: Yuan, Shiran, et al.
Published: (2025) -
EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
by: Lei, Qinqian, et al.
Published: (2024) -
Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding
by: Tatematsu, Fumiya, et al.
Published: (2026)