Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gupta, Kavi, Sanders, Kate, Solar-Lezama, Armando
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2501.02825
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909779928023040
author	Gupta, Kavi Sanders, Kate Solar-Lezama, Armando
author_facet	Gupta, Kavi Sanders, Kate Solar-Lezama, Armando
contents	While LLMs have revolutionized the field of machine learning due to their high performance on a strikingly wide range of problems, they are also known to hallucinate false answers and underperform on less canonical versions of the same tasks. There are several emerging theories of LLM performance, among them that LLMs lack world modeling ability, that they have an undesirable bias towards an autoregressive prior, and that they struggle on more novel problems. The existing literature on LLM input novelty has focused on tasks of relatively high complexity, studying perturbations of canonical but complex problems. In this paper, we attempt to minimize complexity in order to isolate novelty as a factor in LLM underperformance and investigate the power of in-context-learning. To this end, we consider an extremely simple domain: next token prediction on simple language tasks. The twist is that these language tasks are wholly unseen, as they are randomly drawn from a large, parsimoniously defined set of languages arising from simple grammar rules. This experimental setup allows us to evaluate ICL independently of models' parametric knowledge. We find that LLMs uniformly underperform n-gram models on this task, both when used as next token predictors and in chain-of-thought.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_02825
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning Gupta, Kavi Sanders, Kate Solar-Lezama, Armando Machine Learning While LLMs have revolutionized the field of machine learning due to their high performance on a strikingly wide range of problems, they are also known to hallucinate false answers and underperform on less canonical versions of the same tasks. There are several emerging theories of LLM performance, among them that LLMs lack world modeling ability, that they have an undesirable bias towards an autoregressive prior, and that they struggle on more novel problems. The existing literature on LLM input novelty has focused on tasks of relatively high complexity, studying perturbations of canonical but complex problems. In this paper, we attempt to minimize complexity in order to isolate novelty as a factor in LLM underperformance and investigate the power of in-context-learning. To this end, we consider an extremely simple domain: next token prediction on simple language tasks. The twist is that these language tasks are wholly unseen, as they are randomly drawn from a large, parsimoniously defined set of languages arising from simple grammar rules. This experimental setup allows us to evaluate ICL independently of models' parametric knowledge. We find that LLMs uniformly underperform n-gram models on this task, both when used as next token predictors and in chain-of-thought.
title	Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning
topic	Machine Learning
url	https://arxiv.org/abs/2501.02825

Similar Items