Saved in:
Bibliographic Details
Main Authors: Heidenreich, Hunter, Dalvi, Ratish, Mukku, Rohith, Verma, Nikhil, Pičuljan, Neven
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.11981
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Page Stream Segmentation (PSS) is an essential prerequisite for automated document processing at scale. However, research progress has been limited by the absence of realistic public benchmarks. This paper works towards addressing this gap by introducing TABME++, an enhanced benchmark featuring commercial Optical Character Recognition (OCR) annotations. We evaluate the performance of large language models (LLMs) on PSS, focusing on decoder-based models fine-tuned with parameter-efficient methods. Our results show that decoder-based LLMs outperform smaller multimodal encoders. Through a review of existing PSS research and datasets, we identify key challenges and advancements in the field. Our findings highlight the key importance of robust OCR, providing valuable insights for the development of more effective document processing systems.