Saved in:
Bibliographic Details
Main Authors: Long, Rujiao, Xing, Hangdi, Yang, Zhibo, Zheng, Qi, Yu, Zhi, Yao, Cong, Huang, Fei
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.01522
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909060117299200
author Long, Rujiao
Xing, Hangdi
Yang, Zhibo
Zheng, Qi
Yu, Zhi
Yao, Cong
Huang, Fei
author_facet Long, Rujiao
Xing, Hangdi
Yang, Zhibo
Zheng, Qi
Yu, Zhi
Yao, Cong
Huang, Fei
contents Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the corresponding markup sequences from the table images. However, existing approaches either count on additional heuristic rules to recover the table structures, or face challenges in capturing long-range dependencies within tables, resulting in increased complexity. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR. Moreover, inspired by the persuasive success of pre-trained models on a number of computer vision and natural language processing tasks, we propose two pre-training tasks to enrich the spatial and logical representations at the feature level of LORE, resulting in an upgraded version called LORE++. The incorporation of pre-training in LORE++ has proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks against methods of previous paradigms demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR.
format Preprint
id arxiv_https___arxiv_org_abs_2401_01522
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training
Long, Rujiao
Xing, Hangdi
Yang, Zhibo
Zheng, Qi
Yu, Zhi
Yao, Cong
Huang, Fei
Computer Vision and Pattern Recognition
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the corresponding markup sequences from the table images. However, existing approaches either count on additional heuristic rules to recover the table structures, or face challenges in capturing long-range dependencies within tables, resulting in increased complexity. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR. Moreover, inspired by the persuasive success of pre-trained models on a number of computer vision and natural language processing tasks, we propose two pre-training tasks to enrich the spatial and logical representations at the feature level of LORE, resulting in an upgraded version called LORE++. The incorporation of pre-training in LORE++ has proven to enjoy significant advantages, leading to a substantial enhancement in terms of accuracy, generalization, and few-shot capability compared to its predecessor. Experiments on standard benchmarks against methods of previous paradigms demonstrate the superiority of LORE++, which highlights the potential and promising prospect of the logical location regression paradigm for TSR.
title LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2401.01522