Saved in:
| Main Authors: | Wang, Wenjie, Wu, Wei, Liu, Ying, Zhao, Yuan, Lv, Xiaole, Diao, Liang, Fan, Zengjian, Xie, Wenfeng, Lin, Ziling, Shi, De, Huang, Lin, Xu, Kaihe, Li, Hong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06402 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Team PA-VCG's Solution for Competition on Understanding Chinese College Entrance Exam Papers in ICDAR'25
by: Wu, Wei, et al.
Published: (2025)
by: Wu, Wei, et al.
Published: (2025)
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
by: Ouyang, Linke, et al.
Published: (2024)
by: Ouyang, Linke, et al.
Published: (2024)
DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation
by: Du, Yongkun, et al.
Published: (2025)
by: Du, Yongkun, et al.
Published: (2025)
DocFusion: A Unified Framework for Document Parsing Tasks
by: Chai, Mingxu, et al.
Published: (2024)
by: Chai, Mingxu, et al.
Published: (2024)
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
by: Ding, Chuanghao, et al.
Published: (2024)
by: Ding, Chuanghao, et al.
Published: (2024)
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
by: Zhang, Jiaxin, et al.
Published: (2024)
by: Zhang, Jiaxin, et al.
Published: (2024)
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
by: Li, Wendi, et al.
Published: (2024)
by: Li, Wendi, et al.
Published: (2024)
Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing
by: Yang, Minglai, et al.
Published: (2026)
by: Yang, Minglai, et al.
Published: (2026)
Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
by: Ma, Yubo, et al.
Published: (2024)
by: Ma, Yubo, et al.
Published: (2024)
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
by: Yan, Hao, et al.
Published: (2026)
by: Yan, Hao, et al.
Published: (2026)
Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning
by: Mo, Ye, et al.
Published: (2025)
by: Mo, Ye, et al.
Published: (2025)
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
by: Cui, Cheng, et al.
Published: (2025)
by: Cui, Cheng, et al.
Published: (2025)
SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding
by: Xu, Pengxin, et al.
Published: (2026)
by: Xu, Pengxin, et al.
Published: (2026)
DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation
by: Mohammadshirazi, Ahmad, et al.
Published: (2024)
by: Mohammadshirazi, Ahmad, et al.
Published: (2024)
PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding
by: Huang, Kui, et al.
Published: (2025)
by: Huang, Kui, et al.
Published: (2025)
DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents
by: Qian, Kun, et al.
Published: (2025)
by: Qian, Kun, et al.
Published: (2025)
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
by: Ni, Feng, et al.
Published: (2025)
by: Ni, Feng, et al.
Published: (2025)
MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding
by: Chen, Ketong, et al.
Published: (2025)
by: Chen, Ketong, et al.
Published: (2025)
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
by: Tanaka, Ryota, et al.
Published: (2024)
by: Tanaka, Ryota, et al.
Published: (2024)
CogDoc: Towards Unified thinking in Documents
by: Xu, Qixin, et al.
Published: (2025)
by: Xu, Qixin, et al.
Published: (2025)
AgriGPT-VL: Agricultural Vision-Language Understanding Suite
by: Yang, Bo, et al.
Published: (2025)
by: Yang, Bo, et al.
Published: (2025)
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding
by: Zhu, Dawei, et al.
Published: (2025)
by: Zhu, Dawei, et al.
Published: (2025)
Mixed norm estimates for dilated averages over planar curves
by: Li, Junfeng, et al.
Published: (2025)
by: Li, Junfeng, et al.
Published: (2025)
A class of linear operators on Bergman spaces
by: Lou, Zengjian, et al.
Published: (2025)
by: Lou, Zengjian, et al.
Published: (2025)
Parse Graph-Based Visual-Language Interaction for Human Pose Estimation
by: Liu, Shibang, et al.
Published: (2025)
by: Liu, Shibang, et al.
Published: (2025)
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
by: Wang, Ziqing, et al.
Published: (2025)
by: Wang, Ziqing, et al.
Published: (2025)
Basic Cycle Ratio: Cost-Effective Ranking of Influential Spreaders from Local and Global Perspectives
by: Zheng, Wenxin, et al.
Published: (2025)
by: Zheng, Wenxin, et al.
Published: (2025)
DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions
by: Xie, Danning, et al.
Published: (2021)
by: Xie, Danning, et al.
Published: (2021)
DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)
by: Heakl, Ahmed, et al.
Published: (2026)
Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs
by: Shu, Yan, et al.
Published: (2025)
by: Shu, Yan, et al.
Published: (2025)
DocTER: Evaluating Document-based Knowledge Editing
by: Wu, Suhang, et al.
Published: (2023)
by: Wu, Suhang, et al.
Published: (2023)
Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild
by: Zhou, Changda, et al.
Published: (2026)
by: Zhou, Changda, et al.
Published: (2026)
How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings
by: Li, Zhiheng, et al.
Published: (2026)
by: Li, Zhiheng, et al.
Published: (2026)
A FEDformer-Based Hybrid Framework for Anomaly Detection and Risk Forecasting in Financial Time Series
by: Fan, Ziling, et al.
Published: (2025)
by: Fan, Ziling, et al.
Published: (2025)
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
by: Van Landeghem, Jordy, et al.
Published: (2024)
by: Van Landeghem, Jordy, et al.
Published: (2024)
DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal
by: Liu, Wenjie, et al.
Published: (2025)
by: Liu, Wenjie, et al.
Published: (2025)
Similar Items
-
Team PA-VCG's Solution for Competition on Understanding Chinese College Entrance Exam Papers in ICDAR'25
by: Wu, Wei, et al.
Published: (2025) -
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
by: Ouyang, Linke, et al.
Published: (2024) -
DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation
by: Du, Yongkun, et al.
Published: (2025) -
DocFusion: A Unified Framework for Document Parsing Tasks
by: Chai, Mingxu, et al.
Published: (2024) -
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
by: Ding, Chuanghao, et al.
Published: (2024)