Saved in:
| Main Authors: | Lassoued, Aymen, Souibgui, Mohamed Ali, Kessentini, Yousri |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02438 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition
by: Dhiaf, Marwa, et al.
Published: (2023)
by: Dhiaf, Marwa, et al.
Published: (2023)
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
by: Souibgui, Mohamed Ali, et al.
Published: (2025)
by: Souibgui, Mohamed Ali, et al.
Published: (2025)
Privacy-Aware Document Visual Question Answering
by: Tito, Rubèn, et al.
Published: (2023)
by: Tito, Rubèn, et al.
Published: (2023)
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)
by: Cheng, Yu, et al.
Published: (2025)
Machine Unlearning for Document Classification
by: Kang, Lei, et al.
Published: (2024)
by: Kang, Lei, et al.
Published: (2024)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)
by: Wang, Zeqing, et al.
Published: (2023)
VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning
by: Huang, Muye, et al.
Published: (2024)
by: Huang, Muye, et al.
Published: (2024)
Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
by: Indrehus, Kjetil, et al.
Published: (2026)
by: Indrehus, Kjetil, et al.
Published: (2026)
Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
by: Xu, Tao
Published: (2026)
by: Xu, Tao
Published: (2026)
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)
by: Lim, Su Hyeon, et al.
Published: (2024)
MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering
by: Peng, Jingwei, et al.
Published: (2025)
by: Peng, Jingwei, et al.
Published: (2025)
AVIR: Adaptive Visual In-Document Retrieval for Efficient Multi-Page Document Question Answering
by: Li, Zongmin, et al.
Published: (2026)
by: Li, Zongmin, et al.
Published: (2026)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)
by: Shanker, Shambhavi, et al.
Published: (2025)
Elevating Visual Question Answering through Implicitly Learned Reasoning Pathways in LVLMs
by: Jing, Liu, et al.
Published: (2025)
by: Jing, Liu, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)
by: Movva, Prahitha, et al.
Published: (2025)
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
by: Tran, Duong T., et al.
Published: (2025)
by: Tran, Duong T., et al.
Published: (2025)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
by: Wang, Zining, et al.
Published: (2025)
by: Wang, Zining, et al.
Published: (2025)
One missing piece in Vision and Language: A Survey on Comics Understanding
by: Vivoli, Emanuele, et al.
Published: (2024)
by: Vivoli, Emanuele, et al.
Published: (2024)
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
by: Dhouib, Mohamed, et al.
Published: (2025)
by: Dhouib, Mohamed, et al.
Published: (2025)
MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
by: Xi, Suyang, et al.
Published: (2026)
by: Xi, Suyang, et al.
Published: (2026)
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)
by: Tascon-Morales, Sergio, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
Leveraging Contrastive Learning for a Similarity-Guided Tampered Document Data Generation Pipeline
by: Dhouib, Mohamed, et al.
Published: (2026)
by: Dhouib, Mohamed, et al.
Published: (2026)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
by: Ishihara, Keishi, et al.
Published: (2025)
by: Ishihara, Keishi, et al.
Published: (2025)
D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning
by: Tang, Changli, et al.
Published: (2026)
by: Tang, Changli, et al.
Published: (2026)
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
by: Kang, Lei, et al.
Published: (2024)
by: Kang, Lei, et al.
Published: (2024)
Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
by: Pintore, Marco, et al.
Published: (2025)
by: Pintore, Marco, et al.
Published: (2025)
ComicsPAP: understanding comic strips by picking the correct panel
by: Vivoli, Emanuele, et al.
Published: (2025)
by: Vivoli, Emanuele, et al.
Published: (2025)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)
by: SR, Nikitha
Published: (2025)
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
by: Choi, Changin, et al.
Published: (2025)
by: Choi, Changin, et al.
Published: (2025)
Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering
by: Jain, Riddhi, et al.
Published: (2025)
by: Jain, Riddhi, et al.
Published: (2025)
Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering
by: Chen, Zhuohong, et al.
Published: (2026)
by: Chen, Zhuohong, et al.
Published: (2026)
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
by: Anaissi, Ali, et al.
Published: (2025)
by: Anaissi, Ali, et al.
Published: (2025)
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
by: Romero, David, et al.
Published: (2024)
by: Romero, David, et al.
Published: (2024)
See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering
by: Wang, Junjie, et al.
Published: (2025)
by: Wang, Junjie, et al.
Published: (2025)
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
by: Pham, Tan-Hanh, et al.
Published: (2024)
by: Pham, Tan-Hanh, et al.
Published: (2024)
Similar Items
-
CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition
by: Dhiaf, Marwa, et al.
Published: (2023) -
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
by: Souibgui, Mohamed Ali, et al.
Published: (2025) -
Privacy-Aware Document Visual Question Answering
by: Tito, Rubèn, et al.
Published: (2023) -
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025) -
Machine Unlearning for Document Classification
by: Kang, Lei, et al.
Published: (2024)