Saved in:
| Main Authors: | Xu, Quanxing, Zhou, Ling, Zhang, Feifei, Tian, Jinyu, Huang, Rubing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.12131 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
by: Xu, Quanxing, et al.
Published: (2025)
by: Xu, Quanxing, et al.
Published: (2025)
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes
by: Kim, Yehna, et al.
Published: (2025)
by: Kim, Yehna, et al.
Published: (2025)
Modularized Zero-shot VQA with Pre-trained Models
by: Cao, Rui, et al.
Published: (2023)
by: Cao, Rui, et al.
Published: (2023)
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
by: Yang, Xiangpeng, et al.
Published: (2024)
by: Yang, Xiangpeng, et al.
Published: (2024)
COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing
by: Li, Yuqi, et al.
Published: (2024)
by: Li, Yuqi, et al.
Published: (2024)
Boosting Audio-visual Zero-shot Learning with Large Language Models
by: Chen, Haoxing, et al.
Published: (2023)
by: Chen, Haoxing, et al.
Published: (2023)
Knowledge Generation for Zero-shot Knowledge-based VQA
by: Cao, Rui, et al.
Published: (2024)
by: Cao, Rui, et al.
Published: (2024)
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
by: Wu, Pengying, et al.
Published: (2024)
by: Wu, Pengying, et al.
Published: (2024)
AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
by: Zhou, Qihang, et al.
Published: (2023)
by: Zhou, Qihang, et al.
Published: (2023)
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
by: Wu, Yongjian, et al.
Published: (2024)
by: Wu, Yongjian, et al.
Published: (2024)
Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object Detection
by: Tang, Lv, et al.
Published: (2023)
by: Tang, Lv, et al.
Published: (2023)
Enhancing Zero-shot Counting via Language-guided Exemplar Learning
by: Wang, Mingjie, et al.
Published: (2024)
by: Wang, Mingjie, et al.
Published: (2024)
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
Attribute Distribution Modeling and Semantic-Visual Alignment for Generative Zero-shot Learning
by: Pu, Haojie, et al.
Published: (2026)
by: Pu, Haojie, et al.
Published: (2026)
Towards Zero-shot Human-Object Interaction Detection via Vision-Language Integration
by: Xue, Weiying, et al.
Published: (2024)
by: Xue, Weiying, et al.
Published: (2024)
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
by: Li, Jiachen, et al.
Published: (2025)
by: Li, Jiachen, et al.
Published: (2025)
Unified Language-driven Zero-shot Domain Adaptation
by: Yang, Senqiao, et al.
Published: (2024)
by: Yang, Senqiao, et al.
Published: (2024)
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?
by: Li, Bangyan, et al.
Published: (2025)
by: Li, Bangyan, et al.
Published: (2025)
Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
by: Deng, Jieren, et al.
Published: (2024)
by: Deng, Jieren, et al.
Published: (2024)
AnyDoor: Zero-shot Object-level Image Customization
by: Chen, Xi, et al.
Published: (2023)
by: Chen, Xi, et al.
Published: (2023)
Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024)
by: Zhu, Huilin, et al.
Published: (2024)
Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation
by: Yang, Yilong, et al.
Published: (2026)
by: Yang, Yilong, et al.
Published: (2026)
MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
by: Wei, Lai, et al.
Published: (2024)
by: Wei, Lai, et al.
Published: (2024)
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)
by: Aklilu, Josiah, et al.
Published: (2024)
Zero-shot Face Editing via ID-Attribute Decoupled Inversion
by: Hou, Yang, et al.
Published: (2025)
by: Hou, Yang, et al.
Published: (2025)
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
by: Deng, Yingying, et al.
Published: (2025)
by: Deng, Yingying, et al.
Published: (2025)
Benchmarking Vision-Language and Multimodal Large Language Models in Zero-shot and Few-shot Scenarios: A study on Christian Iconography
by: Spinaci, Gianmarco, et al.
Published: (2025)
by: Spinaci, Gianmarco, et al.
Published: (2025)
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
by: Zhou, Runjie, et al.
Published: (2026)
by: Zhou, Runjie, et al.
Published: (2026)
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis
by: Chen, Zhe, et al.
Published: (2025)
by: Chen, Zhe, et al.
Published: (2025)
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models
by: Xing, Wenbin, et al.
Published: (2026)
by: Xing, Wenbin, et al.
Published: (2026)
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
by: Xu, Yifang, et al.
Published: (2025)
by: Xu, Yifang, et al.
Published: (2025)
ZeroPose: CAD-Prompted Zero-shot Object 6D Pose Estimation in Cluttered Scenes
by: Chen, Jianqiu, et al.
Published: (2023)
by: Chen, Jianqiu, et al.
Published: (2023)
Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
by: Wang, Shijian, et al.
Published: (2025)
by: Wang, Shijian, et al.
Published: (2025)
Understanding Multi-Agent Reasoning with Large Language Models for Cartoon VQA
by: Wu, Tong, et al.
Published: (2026)
by: Wu, Tong, et al.
Published: (2026)
Model Synthesis for Zero-Shot Model Attribution
by: Yang, Tianyun, et al.
Published: (2023)
by: Yang, Tianyun, et al.
Published: (2023)
GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection
by: Zhang, Jiangning, et al.
Published: (2023)
by: Zhang, Jiangning, et al.
Published: (2023)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
by: Tur, Anil Osman, et al.
Published: (2024)
by: Tur, Anil Osman, et al.
Published: (2024)
Similar Items
-
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026) -
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
by: Xu, Quanxing, et al.
Published: (2025) -
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026) -
Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes
by: Kim, Yehna, et al.
Published: (2025) -
Modularized Zero-shot VQA with Pre-trained Models
by: Cao, Rui, et al.
Published: (2023)