Saved in:
| Main Authors: | Hasan, Mohammed Rakibul, Majid, Rafi, Tahmid, Ahanaf |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.19887 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking
by: Ahmed, Rafid, et al.
Published: (2026)
by: Ahmed, Rafid, et al.
Published: (2026)
BanglaQuAD: A Bengali Open-domain Question Answering Dataset
by: Rony, Md Rashad Al Hasan, et al.
Published: (2024)
by: Rony, Md Rashad Al Hasan, et al.
Published: (2024)
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
by: Barua, Deeparghya Dutta, et al.
Published: (2024)
by: Barua, Deeparghya Dutta, et al.
Published: (2024)
Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects
by: Rubaiyeat, Husne Ara, et al.
Published: (2025)
by: Rubaiyeat, Husne Ara, et al.
Published: (2025)
A Two-Stage Multitask Vision-Language Framework for Explainable Crop Disease Visual Question Answering
by: Hossain, Md. Zahid, et al.
Published: (2026)
by: Hossain, Md. Zahid, et al.
Published: (2026)
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
by: Inadumi, Shun, et al.
Published: (2024)
by: Inadumi, Shun, et al.
Published: (2024)
LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
by: Gu, Tiancheng, et al.
Published: (2024)
by: Gu, Tiancheng, et al.
Published: (2024)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Introducing A Bangla Sentence - Gloss Pair Dataset for Bangla Sign Language Translation and Research
by: Saha, Neelavro, et al.
Published: (2025)
by: Saha, Neelavro, et al.
Published: (2025)
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
by: Ding, Yihao, et al.
Published: (2024)
by: Ding, Yihao, et al.
Published: (2024)
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations
by: Ford, James, et al.
Published: (2024)
by: Ford, James, et al.
Published: (2024)
Real-time Bangla Sign Language Translator
by: Pranto, Rotan Hawlader, et al.
Published: (2024)
by: Pranto, Rotan Hawlader, et al.
Published: (2024)
CommVQA: Situating Visual Question Answering in Communicative Contexts
by: Naik, Nandita Shankar, et al.
Published: (2024)
by: Naik, Nandita Shankar, et al.
Published: (2024)
Multimodal Integration of Human-Like Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
by: Nguyen, Ngoc Son, et al.
Published: (2024)
by: Nguyen, Ngoc Son, et al.
Published: (2024)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
DermaBench: A Clinician-Annotated Benchmark Dataset for Dermatology Visual Question Answering and Reasoning
by: Yilmaz, Abdurrahim, et al.
Published: (2026)
by: Yilmaz, Abdurrahim, et al.
Published: (2026)
Less Is More? Selective Visual Attention to High-Importance Regions for Multimodal Radiology Summarization
by: Naznin, Mst. Fahmida Sultana, et al.
Published: (2026)
by: Naznin, Mst. Fahmida Sultana, et al.
Published: (2026)
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
by: Ma, Ziyu, et al.
Published: (2024)
by: Ma, Ziyu, et al.
Published: (2024)
Computed Tomography Visual Question Answering with Cross-modal Feature Graphing
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024)
by: Siripong, Surasakdi, et al.
Published: (2024)
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
by: Su, Tongkun, et al.
Published: (2024)
by: Su, Tongkun, et al.
Published: (2024)
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
by: Awal, Rabiul, et al.
Published: (2023)
by: Awal, Rabiul, et al.
Published: (2023)
AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning
by: Zhang, Peifeng, et al.
Published: (2026)
by: Zhang, Peifeng, et al.
Published: (2026)
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
by: Peng, Daowan, et al.
Published: (2025)
by: Peng, Daowan, et al.
Published: (2025)
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
by: Zhang, Jiarui, et al.
Published: (2023)
by: Zhang, Jiarui, et al.
Published: (2023)
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
by: Pramanick, Shraman, et al.
Published: (2024)
by: Pramanick, Shraman, et al.
Published: (2024)
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
by: Ha, Cuong Nhat, et al.
Published: (2024)
by: Ha, Cuong Nhat, et al.
Published: (2024)
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
by: Kuang, Jiayi, et al.
Published: (2024)
by: Kuang, Jiayi, et al.
Published: (2024)
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
by: Serra, Francesco Dalla, et al.
Published: (2025)
by: Serra, Francesco Dalla, et al.
Published: (2025)
RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
by: Butsanets, Léo, et al.
Published: (2025)
by: Butsanets, Léo, et al.
Published: (2025)
LOVA3: Learning to Visual Question Answering, Asking and Assessment
by: Zhao, Henry Hengyuan, et al.
Published: (2024)
by: Zhao, Henry Hengyuan, et al.
Published: (2024)
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
by: Thai, Triet Minh, et al.
Published: (2023)
by: Thai, Triet Minh, et al.
Published: (2023)
MOTOR: Multimodal Optimal Transport via Grounded Retrieval in Medical Visual Question Answering
by: Shaaban, Mai A., et al.
Published: (2025)
by: Shaaban, Mai A., et al.
Published: (2025)
Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
by: Kim, Jongha, et al.
Published: (2026)
by: Kim, Jongha, et al.
Published: (2026)
Exploring Diverse Methods in Visual Question Answering
by: Li, Panfeng, et al.
Published: (2024)
by: Li, Panfeng, et al.
Published: (2024)
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
by: Kil, Jihyung, et al.
Published: (2024)
by: Kil, Jihyung, et al.
Published: (2024)
Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
by: Cho, Yeongjae, et al.
Published: (2024)
by: Cho, Yeongjae, et al.
Published: (2024)
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering
by: Fu, Xingyu, et al.
Published: (2023)
by: Fu, Xingyu, et al.
Published: (2023)
Similar Items
-
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking
by: Ahmed, Rafid, et al.
Published: (2026) -
BanglaQuAD: A Bengali Open-domain Question Answering Dataset
by: Rony, Md Rashad Al Hasan, et al.
Published: (2024) -
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
by: Barua, Deeparghya Dutta, et al.
Published: (2024) -
Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects
by: Rubaiyeat, Husne Ara, et al.
Published: (2025) -
A Two-Stage Multitask Vision-Language Framework for Explainable Crop Disease Visual Question Answering
by: Hossain, Md. Zahid, et al.
Published: (2026)