Saved in:
Bibliographic Details
Main Authors: Ling, Jun, Qi, Yao, Huang, Tao, Zhou, Shibo, Huang, Yanqin, Yang, Jiang, Song, Ziqi, Zhou, Ying, Yang, Yang, Shen, Heng Tao, Wang, Peng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.17589
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately handling complex tables -- those with large sizes, deeply nested structures, and semantically rich or irregular cell content -- where existing methods often fail. We begin with a comprehensive analysis, identifying key challenges and highlighting the limitations of current evaluation protocols. To overcome these issues, we propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-LaTeX dataset. To further improve generation quality, we introduce a dual-reward reinforcement learning strategy based on Group Relative Policy Optimization (GRPO). Unlike standard approaches that optimize purely over text outputs, our method incorporates both a structure-level reward on LaTeX code and a visual fidelity reward computed from rendered outputs, enabling direct optimization of the visual output quality. We adopt a hybrid evaluation protocol combining TEDS-Structure and CW-SSIM, and show that our method achieves state-of-the-art performance, particularly on structurally complex tables, demonstrating the effectiveness and robustness of our approach.