:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lassoued, Aymen, Souibgui, Mohamed Ali, Kessentini, Yousri
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.02438
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition
by: Dhiaf, Marwa, et al.
Published: (2023)

DocVXQA: Context-Aware Visual Explanations for Document Question Answering
by: Souibgui, Mohamed Ali, et al.
Published: (2025)

Privacy-Aware Document Visual Question Answering
by: Tito, Rubèn, et al.
Published: (2023)

Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)

Machine Unlearning for Document Classification
by: Kang, Lei, et al.
Published: (2024)

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)

VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning
by: Huang, Muye, et al.
Published: (2024)

Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
by: Indrehus, Kjetil, et al.
Published: (2026)

Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
by: Xu, Tao
Published: (2026)

Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)

MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering
by: Peng, Jingwei, et al.
Published: (2025)

AVIR: Adaptive Visual In-Document Retrieval for Efficient Multi-Page Document Question Answering
by: Li, Zongmin, et al.
Published: (2026)

Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)

Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)

Elevating Visual Question Answering through Implicitly Learned Reasoning Pathways in LVLMs
by: Jing, Liu, et al.
Published: (2025)

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)

ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
by: Tran, Duong T., et al.
Published: (2025)

Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
by: Wang, Zining, et al.
Published: (2025)

One missing piece in Vision and Language: A Survey on Comics Understanding
by: Vivoli, Emanuele, et al.
Published: (2024)

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
by: Dhouib, Mohamed, et al.
Published: (2025)

MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)

Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)

Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)

Leveraging Contrastive Learning for a Similarity-Guided Tampered Document Data Generation Pipeline
by: Dhouib, Mohamed, et al.
Published: (2026)

Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)

STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
by: Ishihara, Keishi, et al.
Published: (2025)

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning
by: Tang, Changli, et al.
Published: (2026)

Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
by: Kang, Lei, et al.
Published: (2024)

Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
by: Pintore, Marco, et al.
Published: (2025)

ComicsPAP: understanding comic strips by picking the correct panel
by: Vivoli, Emanuele, et al.
Published: (2025)

Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)

Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)

Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
by: Choi, Changin, et al.
Published: (2025)

Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering
by: Jain, Riddhi, et al.
Published: (2025)

Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering
by: Chen, Zhuohong, et al.
Published: (2026)

Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
by: Anaissi, Ali, et al.
Published: (2025)

Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
by: Romero, David, et al.
Published: (2024)

See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering
by: Wang, Junjie, et al.
Published: (2025)

SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
by: Pham, Tan-Hanh, et al.
Published: (2024)