:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Wenjie, Wu, Wei, Liu, Ying, Zhao, Yuan, Lv, Xiaole, Diao, Liang, Fan, Zengjian, Xie, Wenfeng, Lin, Ziling, Shi, De, Huang, Lin, Xu, Kaihe, Li, Hong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.06402
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Team PA-VCG's Solution for Competition on Understanding Chinese College Entrance Exam Papers in ICDAR'25
by: Wu, Wei, et al.
Published: (2025)

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
by: Ouyang, Linke, et al.
Published: (2024)

DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation
by: Du, Yongkun, et al.
Published: (2025)

DocFusion: A Unified Framework for Document Parsing Tasks
by: Chai, Mingxu, et al.
Published: (2024)

SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
by: Ding, Chuanghao, et al.
Published: (2024)

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
by: Zhang, Jiaxin, et al.
Published: (2024)

Reinforcement Learning with Token-level Feedback for Controllable Text Generation
by: Li, Wendi, et al.
Published: (2024)

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing
by: Yang, Minglai, et al.
Published: (2026)

Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research
by: Dong, Kuicai, et al.
Published: (2025)

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
by: Ma, Yubo, et al.
Published: (2024)

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
by: Cui, Cheng, et al.
Published: (2026)

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
by: Yan, Hao, et al.
Published: (2026)

Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning
by: Mo, Ye, et al.
Published: (2025)

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
by: Yang, Rui, et al.
Published: (2025)

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
by: Cui, Cheng, et al.
Published: (2025)

SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding
by: Xu, Pengxin, et al.
Published: (2026)

DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation
by: Mohammadshirazi, Ahmad, et al.
Published: (2024)

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding
by: Huang, Kui, et al.
Published: (2025)

DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents
by: Qian, Kun, et al.
Published: (2025)

PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
by: Ni, Feng, et al.
Published: (2025)

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding
by: Chen, Ketong, et al.
Published: (2025)

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
by: Tanaka, Ryota, et al.
Published: (2024)

CogDoc: Towards Unified thinking in Documents
by: Xu, Qixin, et al.
Published: (2025)

AgriGPT-VL: Agricultural Vision-Language Understanding Suite
by: Yang, Bo, et al.
Published: (2025)

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
by: Cui, Cheng, et al.
Published: (2026)

DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding
by: Zhu, Dawei, et al.
Published: (2025)

Mixed norm estimates for dilated averages over planar curves
by: Li, Junfeng, et al.
Published: (2025)

A class of linear operators on Bergman spaces
by: Lou, Zengjian, et al.
Published: (2025)

Parse Graph-Based Visual-Language Interaction for Human Pose Estimation
by: Liu, Shibang, et al.
Published: (2025)

AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
by: Wang, Ziqing, et al.
Published: (2025)

Basic Cycle Ratio: Cost-Effective Ranking of Influential Spreaders from Local and Global Perspectives
by: Zheng, Wenxin, et al.
Published: (2025)

DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions
by: Xie, Danning, et al.
Published: (2021)

DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)

Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs
by: Shu, Yan, et al.
Published: (2025)

DocTER: Evaluating Document-based Knowledge Editing
by: Wu, Suhang, et al.
Published: (2023)

Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild
by: Zhou, Changda, et al.
Published: (2026)

How Far Is Document Parsing from Solved? PureDocBench: A Source-TraceableBenchmark across Clean, Degraded, and Real-World Settings
by: Li, Zhiheng, et al.
Published: (2026)

A FEDformer-Based Hybrid Framework for Anomaly Detection and Risk Forecasting in Financial Time Series
by: Fan, Ziling, et al.
Published: (2025)

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
by: Van Landeghem, Jordy, et al.
Published: (2024)

DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal
by: Liu, Wenjie, et al.
Published: (2025)