Saved in:
| Main Authors: | Zhang, Zhilin, Wu, Fangyu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.00479 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
by: Thai, Triet Minh, et al.
Published: (2023)
by: Thai, Triet Minh, et al.
Published: (2023)
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
by: Nikandrou, Malvina, et al.
Published: (2024)
by: Nikandrou, Malvina, et al.
Published: (2024)
Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
by: Nguyen, Ngoc Son, et al.
Published: (2024)
by: Nguyen, Ngoc Son, et al.
Published: (2024)
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
by: Zhang, Zhilin, et al.
Published: (2024)
by: Zhang, Zhilin, et al.
Published: (2024)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
A Simple LLM Framework for Long-Range Video Question-Answering
by: Zhang, Ce, et al.
Published: (2023)
by: Zhang, Ce, et al.
Published: (2023)
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
by: Loem, Mengsay, et al.
Published: (2025)
by: Loem, Mengsay, et al.
Published: (2025)
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
by: Xue, Junxiao, et al.
Published: (2024)
by: Xue, Junxiao, et al.
Published: (2024)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
Computed Tomography Visual Question Answering with Cross-modal Feature Graphing
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
Harmonizing Feature Maps: A Graph Convolutional Approach for Enhancing Adversarial Robustness
by: Zhang, Kejia, et al.
Published: (2024)
by: Zhang, Kejia, et al.
Published: (2024)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)
by: Tascon-Morales, Sergio, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)
by: Cheng, Yu, et al.
Published: (2025)
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
by: Wang, Yanling, et al.
Published: (2025)
by: Wang, Yanling, et al.
Published: (2025)
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)
by: Özdemir, Övgü, et al.
Published: (2024)
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
Object Retrieval for Visual Question Answering with Outside Knowledge
by: Kan, Shichao, et al.
Published: (2024)
by: Kan, Shichao, et al.
Published: (2024)
VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)
by: An, Jianing, et al.
Published: (2025)
Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering
by: Tao, Qian, et al.
Published: (2025)
by: Tao, Qian, et al.
Published: (2025)
Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)
by: Shanker, Shambhavi, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)
by: Movva, Prahitha, et al.
Published: (2025)
DriveLM: Driving with Graph Visual Question Answering
by: Sima, Chonghao, et al.
Published: (2023)
by: Sima, Chonghao, et al.
Published: (2023)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)
by: SR, Nikitha
Published: (2025)
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)
by: Zhang, Zhengxuan, et al.
Published: (2025)
Saliency Guided Longitudinal Medical Visual Question Answering
by: Wu, Jialin, et al.
Published: (2025)
by: Wu, Jialin, et al.
Published: (2025)
Visual and Textual Prompts in VLLMs for Enhancing Emotion Recognition
by: Wang, Zhifeng, et al.
Published: (2025)
by: Wang, Zhifeng, et al.
Published: (2025)
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
by: Romero, David, et al.
Published: (2024)
by: Romero, David, et al.
Published: (2024)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
by: Zheng, Peiru, et al.
Published: (2024)
by: Zheng, Peiru, et al.
Published: (2024)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)
by: Wang, Zeqing, et al.
Published: (2023)
QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning
by: Xu, Quanxing, et al.
Published: (2025)
by: Xu, Quanxing, et al.
Published: (2025)
Reconstruction as a Bridge for Event-Based Visual Question Answering
by: Lou, Hanyue, et al.
Published: (2025)
by: Lou, Hanyue, et al.
Published: (2025)
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
by: Li, Shuang, et al.
Published: (2024)
by: Li, Shuang, et al.
Published: (2024)
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
by: Kabir, Raihan, et al.
Published: (2024)
by: Kabir, Raihan, et al.
Published: (2024)
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
by: Yang, Tianyu, et al.
Published: (2024)
by: Yang, Tianyu, et al.
Published: (2024)
Multi-Sourced Compositional Generalization in Visual Question Answering
by: Li, Chuanhao, et al.
Published: (2025)
by: Li, Chuanhao, et al.
Published: (2025)
Similar Items
-
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
by: Thai, Triet Minh, et al.
Published: (2023) -
Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
by: Nikandrou, Malvina, et al.
Published: (2024) -
Advancing Vietnamese Visual Question Answering with Transformer and Convolutional Integration
by: Nguyen, Ngoc Son, et al.
Published: (2024) -
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
by: Zhang, Zhilin, et al.
Published: (2024) -
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)