Saved in:
| Main Authors: | Zhang, Junkai, Li, Bin, Zhou, Shoujun, Du, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.03135 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge
by: Li, Bin, et al.
Published: (2025)
by: Li, Bin, et al.
Published: (2025)
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
by: Zhang, Zhilin, et al.
Published: (2024)
by: Zhang, Zhilin, et al.
Published: (2024)
Q-FSRU: Quantum-Augmented Frequency-Spectral Fusion for Medical Visual Question Answering
by: Thakur, Rakesh, et al.
Published: (2025)
by: Thakur, Rakesh, et al.
Published: (2025)
Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering
by: Hagen, Luca, et al.
Published: (2026)
by: Hagen, Luca, et al.
Published: (2026)
Saliency Guided Longitudinal Medical Visual Question Answering
by: Wu, Jialin, et al.
Published: (2025)
by: Wu, Jialin, et al.
Published: (2025)
Free Form Medical Visual Question Answering in Radiology
by: Narayanan, Abhishek, et al.
Published: (2024)
by: Narayanan, Abhishek, et al.
Published: (2024)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
by: Zong, Chang, et al.
Published: (2025)
by: Zong, Chang, et al.
Published: (2025)
TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering
by: Rajkhowa, Tonmoy, et al.
Published: (2024)
by: Rajkhowa, Tonmoy, et al.
Published: (2024)
Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
by: Li, Zhifei, et al.
Published: (2025)
by: Li, Zhifei, et al.
Published: (2025)
Location-Aware Pretraining for Medical Difference Visual Question Answering
by: Musinguzi, Denis, et al.
Published: (2026)
by: Musinguzi, Denis, et al.
Published: (2026)
MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)
by: Mao, Xianwei, et al.
Published: (2026)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)
by: Xi, Suyang, et al.
Published: (2026)
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
by: Shen, Ruoyue, et al.
Published: (2024)
by: Shen, Ruoyue, et al.
Published: (2024)
SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering
by: Zhang, Yan, et al.
Published: (2025)
by: Zhang, Yan, et al.
Published: (2025)
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
by: Li, Bin, et al.
Published: (2022)
by: Li, Bin, et al.
Published: (2022)
Lagrange Duality and Compound Multi-Attention Transformer for Semi-Supervised Medical Image Segmentation
by: Zheng, Fuchen, et al.
Published: (2024)
by: Zheng, Fuchen, et al.
Published: (2024)
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
by: Li, Guangyao, et al.
Published: (2024)
by: Li, Guangyao, et al.
Published: (2024)
Object Attribute Matters in Visual Question Answering
by: Li, Peize, et al.
Published: (2023)
by: Li, Peize, et al.
Published: (2023)
Cause-Effect Driven Optimization for Robust Medical Visual Question Answering with Language Biases
by: Zhu, Huanjia, et al.
Published: (2025)
by: Zhu, Huanjia, et al.
Published: (2025)
Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)
by: Li, Chuanhao, et al.
Published: (2025)
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering
by: Cai, Yuliang, et al.
Published: (2024)
by: Cai, Yuliang, et al.
Published: (2024)
Parameter-Efficient VLMs for Gastrointestinal Endoscopy: Medical Image Generation and Clinical Visual Question Answering
by: Peter, Ojonugwa Oluwafemi Ejiga, et al.
Published: (2026)
by: Peter, Ojonugwa Oluwafemi Ejiga, et al.
Published: (2026)
M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding
by: Liu, Shenxi, et al.
Published: (2025)
by: Liu, Shenxi, et al.
Published: (2025)
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
by: Li, Xu, et al.
Published: (2025)
by: Li, Xu, et al.
Published: (2025)
VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)
by: Jia, Ziheng, et al.
Published: (2024)
Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)
VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)
by: An, Jianing, et al.
Published: (2025)
FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering
by: Tong, Chaodong, et al.
Published: (2026)
by: Tong, Chaodong, et al.
Published: (2026)
Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering
by: Felizzi, Federico, et al.
Published: (2025)
by: Felizzi, Federico, et al.
Published: (2025)
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)
by: Zhang, Zhengxuan, et al.
Published: (2025)
Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering
by: Ahir, Param, et al.
Published: (2023)
by: Ahir, Param, et al.
Published: (2023)
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
by: Liu, Ying, et al.
Published: (2024)
by: Liu, Ying, et al.
Published: (2024)
Exploring Diverse Methods in Visual Question Answering
by: Li, Panfeng, et al.
Published: (2024)
by: Li, Panfeng, et al.
Published: (2024)
Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
by: Shourya, Aditya, et al.
Published: (2025)
by: Shourya, Aditya, et al.
Published: (2025)
EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
by: Li, Yanjun, et al.
Published: (2025)
by: Li, Yanjun, et al.
Published: (2025)
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)
by: Özdemir, Övgü, et al.
Published: (2024)
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
by: Guo, Danfeng, et al.
Published: (2024)
by: Guo, Danfeng, et al.
Published: (2024)
LOVA3: Learning to Visual Question Answering, Asking and Assessment
by: Zhao, Henry Hengyuan, et al.
Published: (2024)
by: Zhao, Henry Hengyuan, et al.
Published: (2024)
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
by: Li, Yili, et al.
Published: (2024)
by: Li, Yili, et al.
Published: (2024)
Similar Items
-
Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge
by: Li, Bin, et al.
Published: (2025) -
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
by: Zhang, Zhilin, et al.
Published: (2024) -
Q-FSRU: Quantum-Augmented Frequency-Spectral Fusion for Medical Visual Question Answering
by: Thakur, Rakesh, et al.
Published: (2025) -
Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering
by: Hagen, Luca, et al.
Published: (2026) -
Saliency Guided Longitudinal Medical Visual Question Answering
by: Wu, Jialin, et al.
Published: (2025)