Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pannatier, Arnaud, Courdier, Evann, Fleuret, François
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2404.09562
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914851807297536
author	Pannatier, Arnaud Courdier, Evann Fleuret, François
author_facet	Pannatier, Arnaud Courdier, Evann Fleuret, François
contents	Autoregressive models, such as the GPT family, use a fixed order, usually left-to-right, to generate sequences. However, this is not a necessity. In this paper, we challenge this assumption and show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample which offers key advantageous properties. It allows for the sampling of and conditioning on arbitrary subsets of tokens, and it also allows sampling in one shot multiple tokens dynamically according to a rejection strategy, leading to a sub-linear number of model evaluations. We evaluate our method across various domains, including language modeling, path-solving, and aircraft vertical rate prediction, decreasing the number of steps required for generation by an order of magnitude.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_09562
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	σ-GPTs: A New Approach to Autoregressive Models Pannatier, Arnaud Courdier, Evann Fleuret, François Machine Learning Artificial Intelligence Autoregressive models, such as the GPT family, use a fixed order, usually left-to-right, to generate sequences. However, this is not a necessity. In this paper, we challenge this assumption and show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample which offers key advantageous properties. It allows for the sampling of and conditioning on arbitrary subsets of tokens, and it also allows sampling in one shot multiple tokens dynamically according to a rejection strategy, leading to a sub-linear number of model evaluations. We evaluate our method across various domains, including language modeling, path-solving, and aircraft vertical rate prediction, decreasing the number of steps required for generation by an order of magnitude.
title	σ-GPTs: A New Approach to Autoregressive Models
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2404.09562

Similar Items