:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhilin, Wu, Fangyu
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2405.00479
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
by: Thai, Triet Minh, et al.
Published: (2023)

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
by: Nikandrou, Malvina, et al.
Published: (2024)

Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
by: Nguyen, Ngoc Son, et al.
Published: (2024)

Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
by: Zhang, Zhilin, et al.
Published: (2024)

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)

A Simple LLM Framework for Long-Range Video Question-Answering
by: Zhang, Ce, et al.
Published: (2023)

Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
by: Loem, Mengsay, et al.
Published: (2025)

Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
by: Xue, Junxiao, et al.
Published: (2024)

Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)

Computed Tomography Visual Question Answering with Cross-modal Feature Graphing
by: Tian, Yuanhe, et al.
Published: (2025)

Harmonizing Feature Maps: A Graph Convolutional Approach for Enhancing Adversarial Robustness
by: Zhang, Kejia, et al.
Published: (2024)

Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)

Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)

Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)

Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)

Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)

Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)

VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
by: Wang, Yanling, et al.
Published: (2025)

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)

Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)

Object Retrieval for Visual Question Answering with Outside Knowledge
by: Kan, Shichao, et al.
Published: (2024)

VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)

Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
by: Tao, Qian, et al.
Published: (2025)

Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)

DriveLM: Driving with Graph Visual Question Answering
by: Sima, Chonghao, et al.
Published: (2023)

Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)

Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)

Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)

Saliency Guided Longitudinal Medical Visual Question Answering
by: Wu, Jialin, et al.
Published: (2025)

Visual and Textual Prompts in VLLMs for Enhancing Emotion Recognition
by: Wang, Zhifeng, et al.
Published: (2025)

Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
by: Romero, David, et al.
Published: (2024)

SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
by: Zheng, Peiru, et al.
Published: (2024)

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)

QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
by: Xu, Quanxing, et al.
Published: (2025)

Reconstruction as a Bridge for Event-Based Visual Question Answering
by: Lou, Hanyue, et al.
Published: (2025)

FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
by: Li, Shuang, et al.
Published: (2024)

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
by: Kabir, Raihan, et al.
Published: (2024)

SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
by: Yang, Tianyu, et al.
Published: (2024)

Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)