Saved in:
| Main Authors: | Shaaban, Mai A., Saleem, Tausifa Jan, Papineni, Vijay Ram, Yaqub, Mohammad |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.22900 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep Learning-Based Automated Segmentation of Uterine Myomas
by: Saleem, Tausifa Jan, et al.
Published: (2025)
by: Saleem, Tausifa Jan, et al.
Published: (2025)
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
by: Shaaban, Mai A., et al.
Published: (2024)
by: Shaaban, Mai A., et al.
Published: (2024)
DuPLUS: Dual-Prompt Vision-Language Framework for Universal Medical Image Segmentation and Prognosis
by: Saeed, Numan, et al.
Published: (2025)
by: Saeed, Numan, et al.
Published: (2025)
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
by: Ding, Yihao, et al.
Published: (2024)
by: Ding, Yihao, et al.
Published: (2024)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
MINDSETS: Multi-omics Integration with Neuroimaging for Dementia Subtyping and Effective Temporal Study
by: Hassan, Salma, et al.
Published: (2024)
by: Hassan, Salma, et al.
Published: (2024)
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation
by: Sanjeev, Santosh, et al.
Published: (2024)
by: Sanjeev, Santosh, et al.
Published: (2024)
XReal: Realistic Anatomy and Pathology-Aware X-ray Generation via Controllable Diffusion Model
by: Hashmi, Anees Ur Rehman, et al.
Published: (2024)
by: Hashmi, Anees Ur Rehman, et al.
Published: (2024)
Multimodal Integration of Human-Like Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
by: Serra, Francesco Dalla, et al.
Published: (2025)
by: Serra, Francesco Dalla, et al.
Published: (2025)
MedSPOT: A Workflow-Aware Sequential Grounding Benchmark for Clinical GUI
by: Shakeel, Rozain, et al.
Published: (2026)
by: Shakeel, Rozain, et al.
Published: (2026)
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025)
by: Long, Xinwei, et al.
Published: (2025)
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
by: Su, Tongkun, et al.
Published: (2024)
by: Su, Tongkun, et al.
Published: (2024)
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking
by: Ahmed, Rafid, et al.
Published: (2026)
by: Ahmed, Rafid, et al.
Published: (2026)
LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
by: Gu, Tiancheng, et al.
Published: (2024)
by: Gu, Tiancheng, et al.
Published: (2024)
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
by: Ha, Cuong Nhat, et al.
Published: (2024)
by: Ha, Cuong Nhat, et al.
Published: (2024)
Knowledge Distillation in Vision Transformers: A Critical Review
by: Habib, Gousia, et al.
Published: (2023)
by: Habib, Gousia, et al.
Published: (2023)
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
by: Zhang, Jiarui, et al.
Published: (2023)
by: Zhang, Jiarui, et al.
Published: (2023)
Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering
by: Kim, Jongha, et al.
Published: (2026)
by: Kim, Jongha, et al.
Published: (2026)
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
by: Li, Bin, et al.
Published: (2022)
by: Li, Bin, et al.
Published: (2022)
ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering
by: Kaur, Rachneet, et al.
Published: (2025)
by: Kaur, Rachneet, et al.
Published: (2025)
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
by: Wang, Haibo, et al.
Published: (2024)
by: Wang, Haibo, et al.
Published: (2024)
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights
by: Mishra, Deepali, et al.
Published: (2025)
by: Mishra, Deepali, et al.
Published: (2025)
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance
by: Moradi, Mohammad Mahdi, et al.
Published: (2025)
by: Moradi, Mohammad Mahdi, et al.
Published: (2025)
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
by: Lim, Qi Zhi, et al.
Published: (2025)
by: Lim, Qi Zhi, et al.
Published: (2025)
Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering
by: Lagos, Maximiliano Hormazábal, et al.
Published: (2025)
by: Lagos, Maximiliano Hormazábal, et al.
Published: (2025)
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
by: Chen, Peiyuan, et al.
Published: (2024)
by: Chen, Peiyuan, et al.
Published: (2024)
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
by: Inadumi, Shun, et al.
Published: (2024)
by: Inadumi, Shun, et al.
Published: (2024)
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)
by: Singh, Shubhankar, et al.
Published: (2024)
CommVQA: Situating Visual Question Answering in Communicative Contexts
by: Naik, Nandita Shankar, et al.
Published: (2024)
by: Naik, Nandita Shankar, et al.
Published: (2024)
Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
by: Nguyen, Ngoc Son, et al.
Published: (2024)
by: Nguyen, Ngoc Son, et al.
Published: (2024)
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
by: Ma, Ziyu, et al.
Published: (2024)
by: Ma, Ziyu, et al.
Published: (2024)
Computed Tomography Visual Question Answering with Cross-modal Feature Graphing
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024)
by: Siripong, Surasakdi, et al.
Published: (2024)
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
by: Awal, Rabiul, et al.
Published: (2023)
by: Awal, Rabiul, et al.
Published: (2023)
AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning
by: Zhang, Peifeng, et al.
Published: (2026)
by: Zhang, Peifeng, et al.
Published: (2026)
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
by: Peng, Daowan, et al.
Published: (2025)
by: Peng, Daowan, et al.
Published: (2025)
Similar Items
-
Deep Learning-Based Automated Segmentation of Uterine Myomas
by: Saleem, Tausifa Jan, et al.
Published: (2025) -
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
by: Shaaban, Mai A., et al.
Published: (2024) -
DuPLUS: Dual-Prompt Vision-Language Framework for Universal Medical Image Segmentation and Prognosis
by: Saeed, Numan, et al.
Published: (2025) -
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
by: Ding, Yihao, et al.
Published: (2024) -
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)