Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteur principal:	Chiang, David
Format:	Preprint
Publié:	2024
Sujets:	Computational Complexity Formal Languages and Automata Theory Machine Learning
Accès en ligne:	https://arxiv.org/abs/2409.13629
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866916549233737728
author	Chiang, David
author_facet	Chiang, David
contents	Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{-O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_13629
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Transformers in Uniform TC$^0$ Chiang, David Computational Complexity Formal Languages and Automata Theory Machine Learning Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{-O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.
title	Transformers in Uniform TC$^0$
topic	Computational Complexity Formal Languages and Automata Theory Machine Learning
url	https://arxiv.org/abs/2409.13629

Documents similaires