Saved in:
| Main Authors: | Lompo, Boammani Aser, Haraoui, Marc |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.07966 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-objective Representation for Numbers in Clinical Narratives: A CamemBERT-Bio-Based Alternative to Large-Scale LLMs
by: Lompo, Boammani Aser, et al.
Published: (2024)
by: Lompo, Boammani Aser, et al.
Published: (2024)
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
by: Yang, Zheyuan, et al.
Published: (2026)
by: Yang, Zheyuan, et al.
Published: (2026)
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
by: Yu, Suhao, et al.
Published: (2025)
by: Yu, Suhao, et al.
Published: (2025)
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
by: Wang, Yanling, et al.
Published: (2025)
by: Wang, Yanling, et al.
Published: (2025)
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
by: Liang, Jianxin, et al.
Published: (2025)
by: Liang, Jianxin, et al.
Published: (2025)
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
by: Chung, Jiwan, et al.
Published: (2024)
by: Chung, Jiwan, et al.
Published: (2024)
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
by: Fang, I-Sheng, et al.
Published: (2025)
by: Fang, I-Sheng, et al.
Published: (2025)
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
by: Zuo, Yuxin, et al.
Published: (2025)
by: Zuo, Yuxin, et al.
Published: (2025)
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
by: Zhu, Zifeng, et al.
Published: (2024)
by: Zhu, Zifeng, et al.
Published: (2024)
Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
by: Zhu, Yingjie, et al.
Published: (2024)
by: Zhu, Yingjie, et al.
Published: (2024)
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
by: Wu, Yin, et al.
Published: (2025)
by: Wu, Yin, et al.
Published: (2025)
Vero: An Open RL Recipe for General Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2026)
by: Sarch, Gabriel, et al.
Published: (2026)
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
by: Chen, Kaijie, et al.
Published: (2025)
by: Chen, Kaijie, et al.
Published: (2025)
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
by: Kamoi, Ryo, et al.
Published: (2024)
by: Kamoi, Ryo, et al.
Published: (2024)
CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding
by: Ung, Huy Quang, et al.
Published: (2025)
by: Ung, Huy Quang, et al.
Published: (2025)
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts
by: Hwang, Taebaek, et al.
Published: (2025)
by: Hwang, Taebaek, et al.
Published: (2025)
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
by: Luo, Yuxuan, et al.
Published: (2025)
by: Luo, Yuxuan, et al.
Published: (2025)
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
by: Li, Hengzhi, et al.
Published: (2025)
by: Li, Hengzhi, et al.
Published: (2025)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs
by: Mayer, Julius, et al.
Published: (2025)
by: Mayer, Julius, et al.
Published: (2025)
Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs
by: Anand, Dhruv, et al.
Published: (2025)
by: Anand, Dhruv, et al.
Published: (2025)
DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
by: Zhong, Yiwu, et al.
Published: (2024)
by: Zhong, Yiwu, et al.
Published: (2024)
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
by: Zhao, Yilun, et al.
Published: (2025)
by: Zhao, Yilun, et al.
Published: (2025)
TempCore: Are Video QA Benchmarks Temporally Grounded? A Frame Selection Sensitivity Analysis and Benchmark
by: Ok, Hyunjong, et al.
Published: (2025)
by: Ok, Hyunjong, et al.
Published: (2025)
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)
by: Huang, Yixu, et al.
Published: (2026)
Seeing Culture: A Benchmark for Visual Reasoning and Grounding
by: Satar, Burak, et al.
Published: (2025)
by: Satar, Burak, et al.
Published: (2025)
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
by: Deng, Naihao, et al.
Published: (2024)
by: Deng, Naihao, et al.
Published: (2024)
Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning
by: Dong, Qihua, et al.
Published: (2026)
by: Dong, Qihua, et al.
Published: (2026)
MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes
by: Pande, Nilay, et al.
Published: (2025)
by: Pande, Nilay, et al.
Published: (2025)
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
by: Bai, Tianyi, et al.
Published: (2025)
by: Bai, Tianyi, et al.
Published: (2025)
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
by: Hsiao, Yu-Chung, et al.
Published: (2022)
by: Hsiao, Yu-Chung, et al.
Published: (2022)
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
by: Wang, Andong, et al.
Published: (2024)
by: Wang, Andong, et al.
Published: (2024)
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
by: Kim, Taehee, et al.
Published: (2024)
by: Kim, Taehee, et al.
Published: (2024)
Similar Items
-
Multi-objective Representation for Numbers in Clinical Narratives: A CamemBERT-Bio-Based Alternative to Large-Scale LLMs
by: Lompo, Boammani Aser, et al.
Published: (2024) -
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
by: Yang, Zheyuan, et al.
Published: (2026) -
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
by: Yu, Suhao, et al.
Published: (2025) -
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
by: Mathur, Suyash Vardhan, et al.
Published: (2024) -
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)