:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Quanxing, Zhou, Ling, Zhang, Feifei, Tian, Jinyu, Huang, Rubing
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.12131
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026)

QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
by: Xu, Quanxing, et al.
Published: (2025)

Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)

Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes
by: Kim, Yehna, et al.
Published: (2025)

Modularized Zero-shot VQA with Pre-trained Models
by: Cao, Rui, et al.
Published: (2023)

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
by: Yang, Xiangpeng, et al.
Published: (2024)

COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing
by: Li, Yuqi, et al.
Published: (2024)

Boosting Audio-visual Zero-shot Learning with Large Language Models
by: Chen, Haoxing, et al.
Published: (2023)

Knowledge Generation for Zero-shot Knowledge-based VQA
by: Cao, Rui, et al.
Published: (2024)

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
by: Wu, Pengying, et al.
Published: (2024)

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
by: Zhou, Qihang, et al.
Published: (2023)

AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
by: Wu, Yongjian, et al.
Published: (2024)

Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object Detection
by: Tang, Lv, et al.
Published: (2023)

Enhancing Zero-shot Counting via Language-guided Exemplar Learning
by: Wang, Mingjie, et al.
Published: (2024)

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)

Attribute Distribution Modeling and Semantic-Visual Alignment for Generative Zero-shot Learning
by: Pu, Haojie, et al.
Published: (2026)

Towards Zero-shot Human-Object Interaction Detection via Vision-Language Integration
by: Xue, Weiying, et al.
Published: (2024)

LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
by: Li, Jiachen, et al.
Published: (2025)

Unified Language-driven Zero-shot Domain Adaptation
by: Yang, Senqiao, et al.
Published: (2024)

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?
by: Li, Bangyan, et al.
Published: (2025)

Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
by: Deng, Jieren, et al.
Published: (2024)

AnyDoor: Zero-shot Object-level Image Customization
by: Chen, Xi, et al.
Published: (2023)

Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024)

Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation
by: Yang, Yilong, et al.
Published: (2026)

MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
by: Wei, Lai, et al.
Published: (2024)

Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)

Zero-shot Face Editing via ID-Attribute Decoupled Inversion
by: Hou, Yang, et al.
Published: (2025)

Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
by: Deng, Yingying, et al.
Published: (2025)

Benchmarking Vision-Language and Multimodal Large Language Models in Zero-shot and Few-shot Scenarios: A study on Christian Iconography
by: Spinaci, Gianmarco, et al.
Published: (2025)

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
by: Zhou, Runjie, et al.
Published: (2026)

FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis
by: Chen, Zhe, et al.
Published: (2025)

RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
by: Li, Junjie, et al.
Published: (2025)

Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models
by: Xing, Wenbin, et al.
Published: (2026)

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
by: Xu, Yifang, et al.
Published: (2025)

ZeroPose: CAD-Prompted Zero-shot Object 6D Pose Estimation in Cluttered Scenes
by: Chen, Jianqiu, et al.
Published: (2023)

Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
by: Wang, Shijian, et al.
Published: (2025)

Understanding Multi-Agent Reasoning with Large Language Models for Cartoon VQA
by: Wu, Tong, et al.
Published: (2026)

Model Synthesis for Zero-Shot Model Attribution
by: Yang, Tianyun, et al.
Published: (2023)

GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection
by: Zhang, Jiangning, et al.
Published: (2023)

Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
by: Tur, Anil Osman, et al.
Published: (2024)