Saved in:
| Main Authors: | Zhang, Tong, Lin, Honglin, Liu, Zhou, Chen, Chong, Zhang, Wentao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09809 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations
by: Lin, Jamie Menjay, et al.
Published: (2024)
by: Lin, Jamie Menjay, et al.
Published: (2024)
HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery
by: Zhang, Yaping, et al.
Published: (2025)
by: Zhang, Yaping, et al.
Published: (2025)
ParseBench: A Document Parsing Benchmark for AI Agents
by: Zhang, Boyang, et al.
Published: (2026)
by: Zhang, Boyang, et al.
Published: (2026)
Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language
by: Wang, Peijie, et al.
Published: (2026)
by: Wang, Peijie, et al.
Published: (2026)
LongInsightBench: A Comprehensive Benchmark for Evaluating Omni-Modal Models on Human-Centric Long-Video Understanding
by: Han, ZhaoYang, et al.
Published: (2025)
by: Han, ZhaoYang, et al.
Published: (2025)
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation
by: Guo, Longteng, et al.
Published: (2026)
by: Guo, Longteng, et al.
Published: (2026)
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)
by: Deng, Andong, et al.
Published: (2025)
DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
by: Zhang, Qintong, et al.
Published: (2025)
by: Zhang, Qintong, et al.
Published: (2025)
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
by: Lin, Juekai, et al.
Published: (2026)
by: Lin, Juekai, et al.
Published: (2026)
Robust Diagram Reasoning: A Framework for Enhancing LVLM Performance on Visually Perturbed Scientific Diagrams
by: Zhou, Minghao, et al.
Published: (2025)
by: Zhou, Minghao, et al.
Published: (2025)
RoomPilot: Controllable Indoor Scene Synthesis via Multimodal Semantic Parsing
by: Chen, Wentang, et al.
Published: (2025)
by: Chen, Wentang, et al.
Published: (2025)
ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing
by: Shah, Ayush Kumar, et al.
Published: (2023)
by: Shah, Ayush Kumar, et al.
Published: (2023)
RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning
by: Song, Jiahe, et al.
Published: (2025)
by: Song, Jiahe, et al.
Published: (2025)
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
by: Yin, Kun, et al.
Published: (2026)
by: Yin, Kun, et al.
Published: (2026)
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
by: Lin, Honglin, et al.
Published: (2026)
by: Lin, Honglin, et al.
Published: (2026)
S-RRG-Bench: Structured Radiology Report Generation with Fine-Grained Evaluation Framework
by: Li, Yingshu, et al.
Published: (2025)
by: Li, Yingshu, et al.
Published: (2025)
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
by: Zhang, Yuhan, et al.
Published: (2025)
by: Zhang, Yuhan, et al.
Published: (2025)
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
by: Liu, Parker, et al.
Published: (2025)
by: Liu, Parker, et al.
Published: (2025)
SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection
by: Hu, You, et al.
Published: (2026)
by: Hu, You, et al.
Published: (2026)
Towards Real-World Document Parsing via Realistic Scene Synthesis and Document-Aware Training
by: Li, Gengluo, et al.
Published: (2026)
by: Li, Gengluo, et al.
Published: (2026)
OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities
by: Zhang, Peirong, et al.
Published: (2025)
by: Zhang, Peirong, et al.
Published: (2025)
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
by: Li, Zhang, et al.
Published: (2025)
by: Li, Zhang, et al.
Published: (2025)
Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation
by: Li, Lin, et al.
Published: (2025)
by: Li, Lin, et al.
Published: (2025)
TechImage-Bench: Rubric-Based Evaluation for Technical Image Generation
by: Ni, Minheng, et al.
Published: (2025)
by: Ni, Minheng, et al.
Published: (2025)
SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini
by: Chen, Davie
Published: (2026)
by: Chen, Davie
Published: (2026)
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
by: Zhang, Qintong, et al.
Published: (2024)
by: Zhang, Qintong, et al.
Published: (2024)
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
by: Chen, Weidong, et al.
Published: (2026)
by: Chen, Weidong, et al.
Published: (2026)
Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing
by: Song, Jiahe, et al.
Published: (2026)
by: Song, Jiahe, et al.
Published: (2026)
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
by: Tanaka, Shohei, et al.
Published: (2025)
by: Tanaka, Shohei, et al.
Published: (2025)
AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)
by: Wang, Zhengren, et al.
Published: (2026)
Diagram-Driven Course Questions Generation
by: Zhang, Xinyu, et al.
Published: (2024)
by: Zhang, Xinyu, et al.
Published: (2024)
SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model
by: Chang, Yifan, et al.
Published: (2025)
by: Chang, Yifan, et al.
Published: (2025)
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
by: Ouyang, Linke, et al.
Published: (2024)
by: Ouyang, Linke, et al.
Published: (2024)
Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation
by: Zhu, Fanwei, et al.
Published: (2025)
by: Zhu, Fanwei, et al.
Published: (2025)
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
by: Tang, Yuqi, et al.
Published: (2026)
by: Tang, Yuqi, et al.
Published: (2026)
KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding
by: Lin, Boda, et al.
Published: (2026)
by: Lin, Boda, et al.
Published: (2026)
Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation
by: Li, Niantong, et al.
Published: (2026)
by: Li, Niantong, et al.
Published: (2026)
Occlusion-Aware Deep Convolutional Neural Network via Homogeneous Tanh-transforms for Face Parsing
by: Qiua, Jianhua, et al.
Published: (2023)
by: Qiua, Jianhua, et al.
Published: (2023)
Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild
by: Zhou, Changda, et al.
Published: (2026)
by: Zhou, Changda, et al.
Published: (2026)
SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding
by: Xu, Pengxin, et al.
Published: (2026)
by: Xu, Pengxin, et al.
Published: (2026)
Similar Items
-
SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations
by: Lin, Jamie Menjay, et al.
Published: (2024) -
HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery
by: Zhang, Yaping, et al.
Published: (2025) -
ParseBench: A Document Parsing Benchmark for AI Agents
by: Zhang, Boyang, et al.
Published: (2026) -
Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language
by: Wang, Peijie, et al.
Published: (2026) -
LongInsightBench: A Comprehensive Benchmark for Evaluating Omni-Modal Models on Human-Centric Long-Video Understanding
by: Han, ZhaoYang, et al.
Published: (2025)