:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Aguilar, Sergio Torres
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language Databases
Online Access:	https://arxiv.org/abs/2506.20326
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LED: A Benchmark for Evaluating Layout Error Detection in Document Analysis
by: Heo, Inbum, et al.
Published: (2026)

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
by: Yang, Zhantao, et al.
Published: (2024)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
by: Luo, Chuwei, et al.
Published: (2024)

TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER
by: Aguilar, Sergio Torres
Published: (2025)

YOLO Object Detectors for Robotics -- a Comparative Study
by: Niżeniec, Patryk, et al.
Published: (2026)

SODIUM: From Open Web Data to Queryable Databases
by: Hu, Chuxuan, et al.
Published: (2026)

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
by: Zhao, Zhiyuan, et al.
Published: (2024)

Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
by: Lopez-Duran, Miguel, et al.
Published: (2025)

LAPDoc: Layout-Aware Prompting for Documents
by: Lamott, Marcel, et al.
Published: (2024)

Visually Guided Generative Text-Layout Pre-training for Document Intelligence
by: Mao, Zhiming, et al.
Published: (2024)

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
by: Celona, Luigi, et al.
Published: (2023)

SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection
by: Hu, Xingjian, et al.
Published: (2024)

ROAP: A Reading-Order and Attention-Prior Pipeline for Optimizing Layout Transformers in Key Information Extraction
by: Xie, Tingwei, et al.
Published: (2026)

Multimodal Neural Databases
by: Trappolini, Giovanni, et al.
Published: (2023)

Assessing the Capability of YOLO- and Transformer-based Object Detectors for Real-time Weed Detection
by: Allmendinger, Alicia, et al.
Published: (2025)

KH-FUNSD: A Hierarchical and Fine-Grained Layout Analysis Dataset for Low-Resource Khmer Business Document
by: Thuon, Nimol, et al.
Published: (2025)

Co-Layout: LLM-driven Co-optimization for Interior Layout
by: Xiang, Chucheng, et al.
Published: (2025)

TWIX: Automatically Reconstructing Structured Data from Templatized Documents
by: Lin, Yiming, et al.
Published: (2025)

LAND: A Longitudinal Analysis of Neuromorphic Datasets
by: Cohen, Gregory, et al.
Published: (2026)

DLAFormer: An End-to-End Transformer For Document Layout Analysis
by: Wang, Jiawei, et al.
Published: (2024)

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
by: Zhu, Wanrong, et al.
Published: (2024)

Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation
by: Zhao, Penghai, et al.
Published: (2021)

VideoScoop: A Non-Traditional Domain-Independent Framework For Video Analysis
by: Billah, Hafsa
Published: (2025)

Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
by: Abdallah, Abdelrahman, et al.
Published: (2024)

Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
by: Xu, Qinwu
Published: (2026)

Unveiling the Pitfalls of Knowledge Editing for Large Language Models
by: Li, Zhoubo, et al.
Published: (2023)

Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition
by: Do, Thao, et al.
Published: (2024)

CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector
by: Qiu, Tianheng, et al.
Published: (2024)

Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown
by: Duan, Changxu
Published: (2025)

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
by: Chan, Adrian, et al.
Published: (2024)

R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment
by: Li, Zhuangzi, et al.
Published: (2026)

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration
by: Zhang, Yuyi, et al.
Published: (2025)

Extract-Transform-Load for Video Streams
by: Kossmann, Ferdinand, et al.
Published: (2023)

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
by: Fan, Yue, et al.
Published: (2024)

LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
by: Heo, Inbum, et al.
Published: (2025)

A Comparative Study of Continuous Sign Language Recognition Techniques
by: Alyami, Sarah, et al.
Published: (2024)

A Hybrid Approach for Document Layout Analysis in Document images
by: Shehzadi, Tahira, et al.
Published: (2024)

Improving OCR for Historical Texts of Multiple Languages
by: Westerdijk, Hylke, et al.
Published: (2025)

DPCD: A Quality Assessment Database for Dynamic Point Clouds
by: Liu, Yating, et al.
Published: (2025)