Saved in:
| Main Authors: | Al-Mohannadi, Aisha, Firoz, Ayisha, Yang, Yin, Imran, Muhammad, Ofli, Ferda |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13839 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery
by: Al-Emadi, Sara, et al.
Published: (2025)
by: Al-Emadi, Sara, et al.
Published: (2025)
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond
by: Al-Emadi, Noora, et al.
Published: (2025)
by: Al-Emadi, Noora, et al.
Published: (2025)
Bias-Aware Face Mask Detection Dataset
by: Kantarcı, Alperen, et al.
Published: (2022)
by: Kantarcı, Alperen, et al.
Published: (2022)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
Monitoring Critical Infrastructure Facilities During Disasters Using Large Language Models
by: Ziaullah, Abdul Wahab, et al.
Published: (2024)
by: Ziaullah, Abdul Wahab, et al.
Published: (2024)
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
by: Guan, Runwei, et al.
Published: (2025)
by: Guan, Runwei, et al.
Published: (2025)
Landslide Detection in Real-Time Social Media Image Streams
by: Ofli, Ferda, et al.
Published: (2021)
by: Ofli, Ferda, et al.
Published: (2021)
StackOverflowVQA: Stack Overflow Visual Question Answering Dataset
by: Mirzaei, Motahhare, et al.
Published: (2024)
by: Mirzaei, Motahhare, et al.
Published: (2024)
GeoResponder: Towards Building Geospatial LLMs for Time-Critical Disaster Response
by: Zguir, Ahmed El Fekih, et al.
Published: (2025)
by: Zguir, Ahmed El Fekih, et al.
Published: (2025)
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)
by: Vu, Tai, et al.
Published: (2025)
Towards Signboard-Oriented Visual Question Answering: ViSignVQA Dataset, Method and Benchmark
by: Nguyen, Hieu Minh, et al.
Published: (2025)
by: Nguyen, Hieu Minh, et al.
Published: (2025)
Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering
by: Maryam, Hiba, et al.
Published: (2024)
by: Maryam, Hiba, et al.
Published: (2024)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering
by: Zhang, Chengyi, et al.
Published: (2026)
by: Zhang, Chengyi, et al.
Published: (2026)
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
by: Tran, Duong T., et al.
Published: (2025)
by: Tran, Duong T., et al.
Published: (2025)
CommVQA: Situating Visual Question Answering in Communicative Contexts
by: Naik, Nandita Shankar, et al.
Published: (2024)
by: Naik, Nandita Shankar, et al.
Published: (2024)
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
ZeShot-VQA: Zero-Shot Visual Question Answering Framework with Answer Mapping for Natural Disaster Damage Assessment
by: Karimi, Ehsan, et al.
Published: (2025)
by: Karimi, Ehsan, et al.
Published: (2025)
DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment
by: Tehrani, Sara, et al.
Published: (2026)
by: Tehrani, Sara, et al.
Published: (2026)
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
by: Kim, Yoonsik, et al.
Published: (2024)
by: Kim, Yoonsik, et al.
Published: (2024)
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
by: Zhou, Sheng, et al.
Published: (2025)
by: Zhou, Sheng, et al.
Published: (2025)
SurgViVQA: Temporally-Grounded Video Question Answering for Surgical Scene Understanding
by: Drago, Mauro Orazio, et al.
Published: (2025)
by: Drago, Mauro Orazio, et al.
Published: (2025)
VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)
by: Jia, Ziheng, et al.
Published: (2024)
PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
by: Sakib, Syed Nazmus, et al.
Published: (2025)
by: Sakib, Syed Nazmus, et al.
Published: (2025)
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024)
by: Chen, Pingyi, et al.
Published: (2024)
Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs
by: Wang, Jialou, et al.
Published: (2024)
by: Wang, Jialou, et al.
Published: (2024)
Open-Vocabulary vs Supervised Learning Methods for Post-Disaster Visual Scene Understanding
by: Michailidou, Anna, et al.
Published: (2026)
by: Michailidou, Anna, et al.
Published: (2026)
AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering
by: Tuong, Nguyen Anh, et al.
Published: (2026)
by: Tuong, Nguyen Anh, et al.
Published: (2026)
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
by: Han, Hongyong, et al.
Published: (2025)
by: Han, Hongyong, et al.
Published: (2025)
MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering
by: Li, Zhifei, et al.
Published: (2026)
by: Li, Zhifei, et al.
Published: (2026)
Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring
by: Vu, Sinh Trong, et al.
Published: (2025)
by: Vu, Sinh Trong, et al.
Published: (2025)
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering
by: Ma, Jiatong, et al.
Published: (2026)
by: Ma, Jiatong, et al.
Published: (2026)
ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)
STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
by: Ishihara, Keishi, et al.
Published: (2025)
by: Ishihara, Keishi, et al.
Published: (2025)
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
by: Qian, Tianwen, et al.
Published: (2023)
by: Qian, Tianwen, et al.
Published: (2023)
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
by: He, Runlong, et al.
Published: (2024)
by: He, Runlong, et al.
Published: (2024)
RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
by: Butsanets, Léo, et al.
Published: (2025)
by: Butsanets, Léo, et al.
Published: (2025)
VinDr-CXR-VQA: A Visual Question Answering Dataset for Explainable Chest X-Ray Analysis with Multi-Task Learning
by: Nguyen, Dang H., et al.
Published: (2025)
by: Nguyen, Dang H., et al.
Published: (2025)
CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering
by: Hong, Yuyang, et al.
Published: (2026)
by: Hong, Yuyang, et al.
Published: (2026)
Similar Items
-
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery
by: Al-Emadi, Sara, et al.
Published: (2025) -
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond
by: Al-Emadi, Noora, et al.
Published: (2025) -
Bias-Aware Face Mask Detection Dataset
by: Kantarcı, Alperen, et al.
Published: (2022) -
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024) -
Monitoring Critical Infrastructure Facilities During Disasters Using Large Language Models
by: Ziaullah, Abdul Wahab, et al.
Published: (2024)