Saved in:
Bibliographic Details
Main Authors: Dutta, Avik, Nigam, Harshit, Hasanbeig, Hosein, Radhakrishna, Arjun, Gulwani, Sumit
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.05009
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912810198368256
author Dutta, Avik
Nigam, Harshit
Hasanbeig, Hosein
Radhakrishna, Arjun
Gulwani, Sumit
author_facet Dutta, Avik
Nigam, Harshit
Hasanbeig, Hosein
Radhakrishna, Arjun
Gulwani, Sumit
contents We investigate how large language models (LLMs) fail when tabular data in an otherwise canonical representation is subjected to semantic and structural distortions. Our findings reveal that LLMs lack an inherent ability to detect and correct subtle distortions in table representations. Only when provided with an explicit prior, via a system prompt, do models partially adjust their reasoning strategies and correct some distortions, though not consistently or completely. To study this phenomenon, we introduce a small, expert-curated dataset that explicitly evaluates LLMs on table question answering (TQA) tasks requiring an additional error-correction step prior to analysis. Our results reveal systematic differences in how LLMs ingest and interpret tabular information under distortion, with even SoTA models such as GPT-5.2 model exhibiting a drop of minimum 22% accuracy under distortion. These findings raise important questions for future research, particularly regarding when and how models should autonomously decide to realign tabular inputs, analogous to human behavior, without relying on explicit prompts or tabular data pre-processing.
format Preprint
id arxiv_https___arxiv_org_abs_2601_05009
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions
Dutta, Avik
Nigam, Harshit
Hasanbeig, Hosein
Radhakrishna, Arjun
Gulwani, Sumit
Artificial Intelligence
We investigate how large language models (LLMs) fail when tabular data in an otherwise canonical representation is subjected to semantic and structural distortions. Our findings reveal that LLMs lack an inherent ability to detect and correct subtle distortions in table representations. Only when provided with an explicit prior, via a system prompt, do models partially adjust their reasoning strategies and correct some distortions, though not consistently or completely. To study this phenomenon, we introduce a small, expert-curated dataset that explicitly evaluates LLMs on table question answering (TQA) tasks requiring an additional error-correction step prior to analysis. Our results reveal systematic differences in how LLMs ingest and interpret tabular information under distortion, with even SoTA models such as GPT-5.2 model exhibiting a drop of minimum 22% accuracy under distortion. These findings raise important questions for future research, particularly regarding when and how models should autonomously decide to realign tabular inputs, analogous to human behavior, without relying on explicit prompts or tabular data pre-processing.
title An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions
topic Artificial Intelligence
url https://arxiv.org/abs/2601.05009