Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xiao, Changnan, Liu, Bing
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2404.00560
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917627303034880
author	Xiao, Changnan Liu, Bing
author_facet	Xiao, Changnan Liu, Bing
contents	Length generalization (LG) is a challenging problem in learning to reason. It refers to the phenomenon that when trained on reasoning problems of smaller lengths or sizes, the resulting model struggles with problems of larger sizes or lengths. Although LG has been studied by many researchers, the challenge remains. This paper proposes a theoretical study of LG for problems whose reasoning processes can be modeled as DAGs (directed acyclic graphs). The paper first identifies and proves the conditions under which LG can be achieved in learning to reason. It then designs problem representations based on the theory to learn to solve challenging reasoning problems like parity, addition, and multiplication, using a Transformer to achieve perfect LG.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_00560
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	A Theory for Length Generalization in Learning to Reason Xiao, Changnan Liu, Bing Artificial Intelligence Length generalization (LG) is a challenging problem in learning to reason. It refers to the phenomenon that when trained on reasoning problems of smaller lengths or sizes, the resulting model struggles with problems of larger sizes or lengths. Although LG has been studied by many researchers, the challenge remains. This paper proposes a theoretical study of LG for problems whose reasoning processes can be modeled as DAGs (directed acyclic graphs). The paper first identifies and proves the conditions under which LG can be achieved in learning to reason. It then designs problem representations based on the theory to learn to solve challenging reasoning problems like parity, addition, and multiplication, using a Transformer to achieve perfect LG.
title	A Theory for Length Generalization in Learning to Reason
topic	Artificial Intelligence
url	https://arxiv.org/abs/2404.00560

Similar Items