Saved in:
| Main Authors: | Wang, Jialou, Zhu, Manli, Li, Yulei, Li, Honglei, Yang, Longzhi, Woo, Wai Lok |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.01151 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Geometric Features Enhanced Human-Object Interaction Detection
by: Zhu, Manli, et al.
Published: (2024)
by: Zhu, Manli, et al.
Published: (2024)
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)
by: Vu, Tai, et al.
Published: (2025)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024)
by: Chen, Pingyi, et al.
Published: (2024)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)
by: Jia, Ziheng, et al.
Published: (2024)
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
StackOverflowVQA: Stack Overflow Visual Question Answering Dataset
by: Mirzaei, Motahhare, et al.
Published: (2024)
by: Mirzaei, Motahhare, et al.
Published: (2024)
MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering
by: Li, Zhifei, et al.
Published: (2026)
by: Li, Zhifei, et al.
Published: (2026)
CommVQA: Situating Visual Question Answering in Communicative Contexts
by: Naik, Nandita Shankar, et al.
Published: (2024)
by: Naik, Nandita Shankar, et al.
Published: (2024)
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring
by: Vu, Sinh Trong, et al.
Published: (2025)
by: Vu, Sinh Trong, et al.
Published: (2025)
MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)
by: Mao, Xianwei, et al.
Published: (2026)
Object Attribute Matters in Visual Question Answering
by: Li, Peize, et al.
Published: (2023)
by: Li, Peize, et al.
Published: (2023)
SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering
by: Zhang, Yan, et al.
Published: (2025)
by: Zhang, Yan, et al.
Published: (2025)
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
by: Zhou, Sheng, et al.
Published: (2025)
by: Zhou, Sheng, et al.
Published: (2025)
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
by: Tran, Duong T., et al.
Published: (2025)
by: Tran, Duong T., et al.
Published: (2025)
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
by: He, Runlong, et al.
Published: (2024)
by: He, Runlong, et al.
Published: (2024)
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering
by: Zhang, Chengyi, et al.
Published: (2026)
by: Zhang, Chengyi, et al.
Published: (2026)
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
by: Han, Hongyong, et al.
Published: (2025)
by: Han, Hongyong, et al.
Published: (2025)
LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
by: Li, Zhifei, et al.
Published: (2025)
by: Li, Zhifei, et al.
Published: (2025)
CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering
by: Hong, Yuyang, et al.
Published: (2026)
by: Hong, Yuyang, et al.
Published: (2026)
Patch-level Sounding Object Tracking for Audio-Visual Question Answering
by: Li, Zhangbin, et al.
Published: (2024)
by: Li, Zhangbin, et al.
Published: (2024)
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
by: Kim, Yoonsik, et al.
Published: (2024)
by: Kim, Yoonsik, et al.
Published: (2024)
Towards Signboard-Oriented Visual Question Answering: ViSignVQA Dataset, Method and Benchmark
by: Nguyen, Hieu Minh, et al.
Published: (2025)
by: Nguyen, Hieu Minh, et al.
Published: (2025)
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
by: Guan, Runwei, et al.
Published: (2025)
by: Guan, Runwei, et al.
Published: (2025)
AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation
by: Hu, Rongsheng, et al.
Published: (2026)
by: Hu, Rongsheng, et al.
Published: (2026)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Query-Guided Spatial-Temporal-Frequency Interaction for Music Audio-Visual Question Answering
by: Li, Kun, et al.
Published: (2026)
by: Li, Kun, et al.
Published: (2026)
Structure Causal Models and LLMs Integration in Medical Visual Question Answering
by: Xu, Zibo, et al.
Published: (2025)
by: Xu, Zibo, et al.
Published: (2025)
Object Retrieval for Visual Question Answering with Outside Knowledge
by: Kan, Shichao, et al.
Published: (2024)
by: Kan, Shichao, et al.
Published: (2024)
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)
by: Singh, Shubhankar, et al.
Published: (2024)
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
by: Li, Shuai, et al.
Published: (2025)
by: Li, Shuai, et al.
Published: (2025)
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
by: Pham, Tan-Hanh, et al.
Published: (2024)
by: Pham, Tan-Hanh, et al.
Published: (2024)
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
by: Tao, Qian, et al.
Published: (2025)
by: Tao, Qian, et al.
Published: (2025)
GHR-VQA: Graph-guided Hierarchical Relational Reasoning for Video Question Answering
by: Brilli, Dionysia Danai, et al.
Published: (2025)
by: Brilli, Dionysia Danai, et al.
Published: (2025)
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
by: Diao, Xingjian, et al.
Published: (2025)
by: Diao, Xingjian, et al.
Published: (2025)
Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations
by: Yeh, Yahsin, et al.
Published: (2025)
by: Yeh, Yahsin, et al.
Published: (2025)
Similar Items
-
Geometric Features Enhanced Human-Object Interaction Detection
by: Zhu, Manli, et al.
Published: (2024) -
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025) -
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024) -
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024) -
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)