Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Fengyu, Zhu, Junhao, Song, Kaishi, Chen, Lu, Yao, Zhongming, Li, Tianyi, Jensen, Christian S.
Format:	Preprint
Published:	2026
Subjects:	Databases Computation and Language
Online Access:	https://arxiv.org/abs/2602.22721
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911558472302592
author	Li, Fengyu Zhu, Junhao Song, Kaishi Chen, Lu Yao, Zhongming Li, Tianyi Jensen, Christian S.
author_facet	Li, Fengyu Zhu, Junhao Song, Kaishi Chen, Lu Yao, Zhongming Li, Tianyi Jensen, Christian S.
contents	Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to mitigate training instability. To further enhance robustness of pipeline generation, we develop two complementary mechanisms: operation merge, which filters spurious operations through multi-candidate consensus, and adaptive rollback, which offers runtime protection against information loss in data transformation. Experiments on two benchmark datasets show that, with the same LLM backbone, Operation-R1 achieves average absolute accuracy gains of 8.83 and 4.44 percentage points over multi-step preparation baselines, with 79\% table compression and a 2.2$\times$ reduction in monetary cost.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_22721
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA Li, Fengyu Zhu, Junhao Song, Kaishi Chen, Lu Yao, Zhongming Li, Tianyi Jensen, Christian S. Databases Computation and Language Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to mitigate training instability. To further enhance robustness of pipeline generation, we develop two complementary mechanisms: operation merge, which filters spurious operations through multi-candidate consensus, and adaptive rollback, which offers runtime protection against information loss in data transformation. Experiments on two benchmark datasets show that, with the same LLM backbone, Operation-R1 achieves average absolute accuracy gains of 8.83 and 4.44 percentage points over multi-step preparation baselines, with 79\% table compression and a 2.2$\times$ reduction in monetary cost.
title	Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA
topic	Databases Computation and Language
url	https://arxiv.org/abs/2602.22721

Similar Items