Saved in:
| Main Authors: | Li, Yili, Yu, Jing, Gai, Keke, Xiong, Gang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.07989 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Knowledge Noise Mitigation Framework for Knowledge-based Visual Question Answering
by: Liu, Zhiyue, et al.
Published: (2025)
by: Liu, Zhiyue, et al.
Published: (2025)
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
by: Li, Yili, et al.
Published: (2024)
by: Li, Yili, et al.
Published: (2024)
Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering
by: Ahir, Param, et al.
Published: (2023)
by: Ahir, Param, et al.
Published: (2023)
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)
by: Zhang, Zhengxuan, et al.
Published: (2025)
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
by: Choi, Changin, et al.
Published: (2025)
by: Choi, Changin, et al.
Published: (2025)
StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering
by: Wen, Zhihao, et al.
Published: (2025)
by: Wen, Zhihao, et al.
Published: (2025)
MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)
by: Mao, Xianwei, et al.
Published: (2026)
Object Attribute Matters in Visual Question Answering
by: Li, Peize, et al.
Published: (2023)
by: Li, Peize, et al.
Published: (2023)
ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering
by: Compagnoni, Alberto, et al.
Published: (2025)
by: Compagnoni, Alberto, et al.
Published: (2025)
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
by: Shbita, Basel, et al.
Published: (2026)
by: Shbita, Basel, et al.
Published: (2026)
Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)
by: Li, Chuanhao, et al.
Published: (2025)
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
by: Cocchi, Federico, et al.
Published: (2024)
by: Cocchi, Federico, et al.
Published: (2024)
VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)
by: An, Jianing, et al.
Published: (2025)
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering
by: Naeem, Awais, et al.
Published: (2024)
by: Naeem, Awais, et al.
Published: (2024)
QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
by: Jiang, Zhuohang, et al.
Published: (2025)
by: Jiang, Zhuohang, et al.
Published: (2025)
Find The Gap: Knowledge Base Reasoning For Visual Question Answering
by: Barezi, Elham J., et al.
Published: (2024)
by: Barezi, Elham J., et al.
Published: (2024)
VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)
by: Jia, Ziheng, et al.
Published: (2024)
Free Form Medical Visual Question Answering in Radiology
by: Narayanan, Abhishek, et al.
Published: (2024)
by: Narayanan, Abhishek, et al.
Published: (2024)
Saliency Guided Longitudinal Medical Visual Question Answering
by: Wu, Jialin, et al.
Published: (2025)
by: Wu, Jialin, et al.
Published: (2025)
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)
by: Özdemir, Övgü, et al.
Published: (2024)
TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering
by: Rajkhowa, Tonmoy, et al.
Published: (2024)
by: Rajkhowa, Tonmoy, et al.
Published: (2024)
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by: Zhang, Junkai, et al.
Published: (2025)
by: Zhang, Junkai, et al.
Published: (2025)
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
by: Shen, Zhixuan, et al.
Published: (2024)
by: Shen, Zhixuan, et al.
Published: (2024)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
by: Marouf, Imad Eddine, et al.
Published: (2025)
by: Marouf, Imad Eddine, et al.
Published: (2025)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)
by: Xi, Suyang, et al.
Published: (2026)
Variational Visual Question Answering for Uncertainty-Aware Selective Prediction
by: Wieczorek, Tobias Jan, et al.
Published: (2025)
by: Wieczorek, Tobias Jan, et al.
Published: (2025)
Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering
by: Hagen, Luca, et al.
Published: (2026)
by: Hagen, Luca, et al.
Published: (2026)
Location-Aware Pretraining for Medical Difference Visual Question Answering
by: Musinguzi, Denis, et al.
Published: (2026)
by: Musinguzi, Denis, et al.
Published: (2026)
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
by: Chen, Pingyi, et al.
Published: (2024)
by: Chen, Pingyi, et al.
Published: (2024)
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering
by: Cai, Yuliang, et al.
Published: (2024)
by: Cai, Yuliang, et al.
Published: (2024)
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
by: Liu, Ying, et al.
Published: (2024)
by: Liu, Ying, et al.
Published: (2024)
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering
by: Ma, Jiatong, et al.
Published: (2026)
by: Ma, Jiatong, et al.
Published: (2026)
LingoQA: Visual Question Answering for Autonomous Driving
by: Marcu, Ana-Maria, et al.
Published: (2023)
by: Marcu, Ana-Maria, et al.
Published: (2023)
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
by: Shen, Ruoyue, et al.
Published: (2024)
by: Shen, Ruoyue, et al.
Published: (2024)
Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring
by: Vu, Sinh Trong, et al.
Published: (2025)
by: Vu, Sinh Trong, et al.
Published: (2025)
Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering
by: Jain, Riddhi, et al.
Published: (2025)
by: Jain, Riddhi, et al.
Published: (2025)
Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
by: Shourya, Aditya, et al.
Published: (2025)
by: Shourya, Aditya, et al.
Published: (2025)
Exploring Diverse Methods in Visual Question Answering
by: Li, Panfeng, et al.
Published: (2024)
by: Li, Panfeng, et al.
Published: (2024)
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
by: Hong, Yuyang, et al.
Published: (2025)
by: Hong, Yuyang, et al.
Published: (2025)
Similar Items
-
A Knowledge Noise Mitigation Framework for Knowledge-based Visual Question Answering
by: Liu, Zhiyue, et al.
Published: (2025) -
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
by: Li, Yili, et al.
Published: (2024) -
Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering
by: Ahir, Param, et al.
Published: (2023) -
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025) -
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
by: Choi, Changin, et al.
Published: (2025)