Saved in:
Bibliographic Details
Main Author: Patriota, Alexandre Galvao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.00075
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914832201023488
author Patriota, Alexandre Galvao
author_facet Patriota, Alexandre Galvao
contents This paper introduces a novel training methodology that enables a Transformer model to generalize the addition of two-digit numbers to numbers with unseen lengths of digits. The proposed approach employs an autoregressive generation technique, processing from right to left, which mimics a common manual method for adding large numbers. To the best of my knowledge, this methodology has not been previously explored in the literature. All results are reproducible, and the corresponding R code is available at github.com/AGPatriota/ALGA-R/.
format Preprint
id arxiv_https___arxiv_org_abs_2406_00075
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Arbitrary-Length Generalization for Addition in a Tiny Transformer
Patriota, Alexandre Galvao
Machine Learning
Applications
This paper introduces a novel training methodology that enables a Transformer model to generalize the addition of two-digit numbers to numbers with unseen lengths of digits. The proposed approach employs an autoregressive generation technique, processing from right to left, which mimics a common manual method for adding large numbers. To the best of my knowledge, this methodology has not been previously explored in the literature. All results are reproducible, and the corresponding R code is available at github.com/AGPatriota/ALGA-R/.
title Arbitrary-Length Generalization for Addition in a Tiny Transformer
topic Machine Learning
Applications
url https://arxiv.org/abs/2406.00075