Saved in:
| Main Authors: | Zhou, Minghao, Souza, Rafael, Hu, Yaqian, Che, Luming |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.16972 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning
by: Lu, Taiting, et al.
Published: (2026)
by: Lu, Taiting, et al.
Published: (2026)
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing
by: Zhang, Tong, et al.
Published: (2026)
by: Zhang, Tong, et al.
Published: (2026)
DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026)
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
by: Sun, Yanpeng, et al.
Published: (2025)
by: Sun, Yanpeng, et al.
Published: (2025)
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
by: Hwang, Yerin, et al.
Published: (2025)
by: Hwang, Yerin, et al.
Published: (2025)
Hierarchical Contextual Grounding LVLM: Enhancing Fine-Grained Visual-Language Understanding with Robust Grounding
by: Guo, Leilei, et al.
Published: (2025)
by: Guo, Leilei, et al.
Published: (2025)
Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver
by: Zhang, Zeren, et al.
Published: (2024)
by: Zhang, Zeren, et al.
Published: (2024)
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks
by: Zhang, Fengji, et al.
Published: (2024)
by: Zhang, Fengji, et al.
Published: (2024)
Diagram-Driven Course Questions Generation
by: Zhang, Xinyu, et al.
Published: (2024)
by: Zhang, Xinyu, et al.
Published: (2024)
RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning
by: Song, Jiahe, et al.
Published: (2025)
by: Song, Jiahe, et al.
Published: (2025)
EvoDiagram: Agentic Editable Diagram Creation via Design Expertise Evolution
by: Wang, Tianfu, et al.
Published: (2026)
by: Wang, Tianfu, et al.
Published: (2026)
Fighting Hallucinations with Counterfactuals: Diffusion-Guided Perturbations for LVLM Hallucination Suppression
by: Dastmalchi, Hamidreza, et al.
Published: (2026)
by: Dastmalchi, Hamidreza, et al.
Published: (2026)
Soft Anisotropic Diagrams for Differentiable Image Representation
by: Iinbor, Laki, et al.
Published: (2026)
by: Iinbor, Laki, et al.
Published: (2026)
Temporally Grounding Instructional Diagrams in Unconstrained Videos
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
Historical Astronomical Diagrams Decomposition in Geometric Primitives
by: Kalleli, Syrine, et al.
Published: (2024)
by: Kalleli, Syrine, et al.
Published: (2024)
Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing
by: Song, Jiahe, et al.
Published: (2026)
by: Song, Jiahe, et al.
Published: (2026)
GenAI-DrawIO-Creator: A Framework for Automated Diagram Generation
by: Yu, Jinze, et al.
Published: (2026)
by: Yu, Jinze, et al.
Published: (2026)
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
by: Liu, Zhenyang, et al.
Published: (2025)
by: Liu, Zhenyang, et al.
Published: (2025)
TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram
by: Lei, Mingxi, et al.
Published: (2024)
by: Lei, Mingxi, et al.
Published: (2024)
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
by: Zhang, Jiahao, et al.
Published: (2023)
by: Zhang, Jiahao, et al.
Published: (2023)
Structure Diagram Recognition in Financial Announcements
by: Qiao, Meixuan, et al.
Published: (2023)
by: Qiao, Meixuan, et al.
Published: (2023)
Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Optical Remote Sensing
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
Visual Language Model as a Judge for Object Detection in Industrial Diagrams
by: Ghosh, Sanjukta
Published: (2025)
by: Ghosh, Sanjukta
Published: (2025)
Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward
by: Fan, Zhiyuan, et al.
Published: (2025)
by: Fan, Zhiyuan, et al.
Published: (2025)
GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
by: Zhang, Chengrui, et al.
Published: (2025)
by: Zhang, Chengrui, et al.
Published: (2025)
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
by: Zhao, Yilun, et al.
Published: (2025)
by: Zhao, Yilun, et al.
Published: (2025)
From Engineering Diagrams to Graphs: Digitizing P&IDs with Transformers
by: Stürmer, Jan Marius, et al.
Published: (2024)
by: Stürmer, Jan Marius, et al.
Published: (2024)
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
by: Zhu, Lanyun, et al.
Published: (2025)
by: Zhu, Lanyun, et al.
Published: (2025)
MagicGeo: Training-Free Text-Guided Geometric Diagram Generation
by: Wang, Junxiao, et al.
Published: (2025)
by: Wang, Junxiao, et al.
Published: (2025)
ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing
by: Shah, Ayush Kumar, et al.
Published: (2023)
by: Shah, Ayush Kumar, et al.
Published: (2023)
Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams
by: Seefried, Ethan, et al.
Published: (2026)
by: Seefried, Ethan, et al.
Published: (2026)
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
by: Wang, Teng, et al.
Published: (2024)
by: Wang, Teng, et al.
Published: (2024)
LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection
by: Liu, Qingyuan, et al.
Published: (2025)
by: Liu, Qingyuan, et al.
Published: (2025)
Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models
by: Sasaki, Hiroshi
Published: (2026)
by: Sasaki, Hiroshi
Published: (2026)
Modular Graph Extraction for Handwritten Circuit Diagram Images
by: Bayer, Johannes, et al.
Published: (2024)
by: Bayer, Johannes, et al.
Published: (2024)
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)
by: Movva, Prahitha, et al.
Published: (2025)
GeoLoom: High-quality Geometric Diagram Generation from Textual Input
by: Wei, Xiaojing, et al.
Published: (2025)
by: Wei, Xiaojing, et al.
Published: (2025)
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
by: Zhang, Jiahao, et al.
Published: (2024)
by: Zhang, Jiahao, et al.
Published: (2024)
Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language
by: Wang, Peijie, et al.
Published: (2026)
by: Wang, Peijie, et al.
Published: (2026)
Similar Items
-
OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning
by: Lu, Taiting, et al.
Published: (2026) -
DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026) -
SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing
by: Zhang, Tong, et al.
Published: (2026) -
DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2026) -
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
by: Sun, Yanpeng, et al.
Published: (2025)