Saved in:
| Main Authors: | Shen, Jialu, Lyu, Han, Zhong, Suyang, Li, Hanzheng, Tao, Haoyi, Wang, Nan, Chen, Changhong, Fang, Xi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.28039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
by: Tao, Haoyi, et al.
Published: (2026)
by: Tao, Haoyi, et al.
Published: (2026)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
by: Han, Hongyong, et al.
Published: (2025)
by: Han, Hongyong, et al.
Published: (2025)
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)
by: Vu, Tai, et al.
Published: (2025)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)
by: Xi, Suyang, et al.
Published: (2026)
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024)
by: Chen, Pingyi, et al.
Published: (2024)
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
by: Kim, Yoonsik, et al.
Published: (2024)
by: Kim, Yoonsik, et al.
Published: (2024)
SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory
by: Alam, Samiul, et al.
Published: (2026)
by: Alam, Samiul, et al.
Published: (2026)
MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)
by: Mao, Xianwei, et al.
Published: (2026)
MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
by: Guan, Runwei, et al.
Published: (2025)
by: Guan, Runwei, et al.
Published: (2025)
CommVQA: Situating Visual Question Answering in Communicative Contexts
by: Naik, Nandita Shankar, et al.
Published: (2024)
by: Naik, Nandita Shankar, et al.
Published: (2024)
VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)
by: Jia, Ziheng, et al.
Published: (2024)
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
by: Tran, Duong T., et al.
Published: (2025)
by: Tran, Duong T., et al.
Published: (2025)
Towards Signboard-Oriented Visual Question Answering: ViSignVQA Dataset, Method and Benchmark
by: Nguyen, Hieu Minh, et al.
Published: (2025)
by: Nguyen, Hieu Minh, et al.
Published: (2025)
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering
by: Zhang, Chengyi, et al.
Published: (2026)
by: Zhang, Chengyi, et al.
Published: (2026)
IRPAPERS: A Visual Document Benchmark for Scientific Retrieval and Question Answering
by: Shorten, Connor, et al.
Published: (2026)
by: Shorten, Connor, et al.
Published: (2026)
StackOverflowVQA: Stack Overflow Visual Question Answering Dataset
by: Mirzaei, Motahhare, et al.
Published: (2024)
by: Mirzaei, Motahhare, et al.
Published: (2024)
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)
by: Singh, Shubhankar, et al.
Published: (2024)
Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring
by: Vu, Sinh Trong, et al.
Published: (2025)
by: Vu, Sinh Trong, et al.
Published: (2025)
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
by: Chang, Eun, et al.
Published: (2025)
by: Chang, Eun, et al.
Published: (2025)
ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
by: Huang, Chengyue, et al.
Published: (2025)
by: Huang, Chengyue, et al.
Published: (2025)
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering
by: Ma, Jiatong, et al.
Published: (2026)
by: Ma, Jiatong, et al.
Published: (2026)
Uni-Parser Technical Report
by: Fang, Xi, et al.
Published: (2025)
by: Fang, Xi, et al.
Published: (2025)
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
by: He, Runlong, et al.
Published: (2024)
by: He, Runlong, et al.
Published: (2024)
RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature
by: Li, Hanzheng, et al.
Published: (2025)
by: Li, Hanzheng, et al.
Published: (2025)
PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
by: Sakib, Syed Nazmus, et al.
Published: (2025)
by: Sakib, Syed Nazmus, et al.
Published: (2025)
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models
by: Meng, Tian, et al.
Published: (2024)
by: Meng, Tian, et al.
Published: (2024)
VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images
by: Penzkofer, Anna, et al.
Published: (2024)
by: Penzkofer, Anna, et al.
Published: (2024)
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
by: Ishmam, Md Farhan, et al.
Published: (2023)
by: Ishmam, Md Farhan, et al.
Published: (2023)
Understanding and Reusing Test Suites Across Database Systems
by: Zhong, Suyang, et al.
Published: (2024)
by: Zhong, Suyang, et al.
Published: (2024)
SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
by: Drago, Mauro Orazio, et al.
Published: (2025)
by: Drago, Mauro Orazio, et al.
Published: (2025)
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis
by: Fan, Lin, et al.
Published: (2024)
by: Fan, Lin, et al.
Published: (2024)
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
by: Diao, Xingjian, et al.
Published: (2025)
by: Diao, Xingjian, et al.
Published: (2025)
Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations
by: Yeh, Yahsin, et al.
Published: (2025)
by: Yeh, Yahsin, et al.
Published: (2025)
UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation
by: Ghosh, Shiv, et al.
Published: (2026)
by: Ghosh, Shiv, et al.
Published: (2026)
Similar Items
-
OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
by: Tao, Haoyi, et al.
Published: (2026) -
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024) -
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
by: Han, Hongyong, et al.
Published: (2025) -
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025) -
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)