Saved in:
| Main Authors: | Xu, Quanxing, Zhou, Ling, Zhong, Xian, Zhang, Feifei, Huang, Rubing, Lin, Chia-Wen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.03337 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description
by: Xu, Quanxing, et al.
Published: (2025)
by: Xu, Quanxing, et al.
Published: (2025)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering
by: Ahir, Param, et al.
Published: (2023)
by: Ahir, Param, et al.
Published: (2023)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
Unifying Image Processing as Visual Prompting Question Answering
by: Liu, Yihao, et al.
Published: (2023)
by: Liu, Yihao, et al.
Published: (2023)
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
by: Li, Guangyao, et al.
Published: (2024)
by: Li, Guangyao, et al.
Published: (2024)
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)
by: Özdemir, Övgü, et al.
Published: (2024)
Visual Question Answering on Multiple Remote Sensing Image Modalities
by: Boussaid, Hichem, et al.
Published: (2025)
by: Boussaid, Hichem, et al.
Published: (2025)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
by: Romero, David, et al.
Published: (2024)
by: Romero, David, et al.
Published: (2024)
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
by: Park, Kyu Ri, et al.
Published: (2024)
by: Park, Kyu Ri, et al.
Published: (2024)
Object Attribute Matters in Visual Question Answering
by: Li, Peize, et al.
Published: (2023)
by: Li, Peize, et al.
Published: (2023)
VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)
by: An, Jianing, et al.
Published: (2025)
Towards Flexible Evaluation for Generative Visual Question Answering
by: Ji, Huishan, et al.
Published: (2024)
by: Ji, Huishan, et al.
Published: (2024)
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)
by: Cheng, Yu, et al.
Published: (2025)
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)
by: Tascon-Morales, Sergio, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)
by: SR, Nikitha
Published: (2025)
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
by: Tang, Jingqun, et al.
Published: (2024)
by: Tang, Jingqun, et al.
Published: (2024)
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024)
by: Chen, Pingyi, et al.
Published: (2024)
Reconstruction as a Bridge for Event-Based Visual Question Answering
by: Lou, Hanyue, et al.
Published: (2025)
by: Lou, Hanyue, et al.
Published: (2025)
Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation
by: Liu, Gang, et al.
Published: (2024)
by: Liu, Gang, et al.
Published: (2024)
Describe Anything Model for Visual Question Answering on Text-rich Images
by: Vu, Yen-Linh, et al.
Published: (2025)
by: Vu, Yen-Linh, et al.
Published: (2025)
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
by: Lin, Yuxin, et al.
Published: (2025)
by: Lin, Yuxin, et al.
Published: (2025)
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
by: Chaybouti, Sofian, et al.
Published: (2025)
by: Chaybouti, Sofian, et al.
Published: (2025)
A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering
by: Huang, Xiaofei, et al.
Published: (2022)
by: Huang, Xiaofei, et al.
Published: (2022)
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering
by: Zhong, Liangyu, et al.
Published: (2025)
by: Zhong, Liangyu, et al.
Published: (2025)
Prompt-based Personalized Federated Learning for Medical Visual Question Answering
by: Zhu, He, et al.
Published: (2024)
by: Zhu, He, et al.
Published: (2024)
LOVA3: Learning to Visual Question Answering, Asking and Assessment
by: Zhao, Henry Hengyuan, et al.
Published: (2024)
by: Zhao, Henry Hengyuan, et al.
Published: (2024)
Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images
by: Tosato, Lucrezia, et al.
Published: (2024)
by: Tosato, Lucrezia, et al.
Published: (2024)
DriveLM: Driving with Graph Visual Question Answering
by: Sima, Chonghao, et al.
Published: (2023)
by: Sima, Chonghao, et al.
Published: (2023)
Object Retrieval for Visual Question Answering with Outside Knowledge
by: Kan, Shichao, et al.
Published: (2024)
by: Kan, Shichao, et al.
Published: (2024)
Structure Causal Models and LLMs Integration in Medical Visual Question Answering
by: Xu, Zibo, et al.
Published: (2025)
by: Xu, Zibo, et al.
Published: (2025)
TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering
by: Rajkhowa, Tonmoy, et al.
Published: (2024)
by: Rajkhowa, Tonmoy, et al.
Published: (2024)
Foundational Question Generation for Video Question Answering via an Embedding-Integrated Approach
by: Oh, Ju-Young
Published: (2025)
by: Oh, Ju-Young
Published: (2025)
Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering
by: Chen, Zhuohong, et al.
Published: (2026)
by: Chen, Zhuohong, et al.
Published: (2026)
Similar Items
-
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026) -
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026) -
OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description
by: Xu, Quanxing, et al.
Published: (2025) -
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025) -
Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering
by: Ahir, Param, et al.
Published: (2023)