Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dutta, Avik, Nigam, Harshit, Hasanbeig, Hosein, Radhakrishna, Arjun, Gulwani, Sumit
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.05009
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912810198368256
author	Dutta, Avik Nigam, Harshit Hasanbeig, Hosein Radhakrishna, Arjun Gulwani, Sumit
author_facet	Dutta, Avik Nigam, Harshit Hasanbeig, Hosein Radhakrishna, Arjun Gulwani, Sumit
contents	We investigate how large language models (LLMs) fail when tabular data in an otherwise canonical representation is subjected to semantic and structural distortions. Our findings reveal that LLMs lack an inherent ability to detect and correct subtle distortions in table representations. Only when provided with an explicit prior, via a system prompt, do models partially adjust their reasoning strategies and correct some distortions, though not consistently or completely. To study this phenomenon, we introduce a small, expert-curated dataset that explicitly evaluates LLMs on table question answering (TQA) tasks requiring an additional error-correction step prior to analysis. Our results reveal systematic differences in how LLMs ingest and interpret tabular information under distortion, with even SoTA models such as GPT-5.2 model exhibiting a drop of minimum 22% accuracy under distortion. These findings raise important questions for future research, particularly regarding when and how models should autonomously decide to realign tabular inputs, analogous to human behavior, without relying on explicit prompts or tabular data pre-processing.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_05009
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions Dutta, Avik Nigam, Harshit Hasanbeig, Hosein Radhakrishna, Arjun Gulwani, Sumit Artificial Intelligence We investigate how large language models (LLMs) fail when tabular data in an otherwise canonical representation is subjected to semantic and structural distortions. Our findings reveal that LLMs lack an inherent ability to detect and correct subtle distortions in table representations. Only when provided with an explicit prior, via a system prompt, do models partially adjust their reasoning strategies and correct some distortions, though not consistently or completely. To study this phenomenon, we introduce a small, expert-curated dataset that explicitly evaluates LLMs on table question answering (TQA) tasks requiring an additional error-correction step prior to analysis. Our results reveal systematic differences in how LLMs ingest and interpret tabular information under distortion, with even SoTA models such as GPT-5.2 model exhibiting a drop of minimum 22% accuracy under distortion. These findings raise important questions for future research, particularly regarding when and how models should autonomously decide to realign tabular inputs, analogous to human behavior, without relying on explicit prompts or tabular data pre-processing.
title	An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions
topic	Artificial Intelligence
url	https://arxiv.org/abs/2601.05009

Similar Items