Guardado en:
Detalles Bibliográficos
Autores principales: D'Istria, Pierre Colonna, Altahhan, Abdulrahman
Formato: Preprint
Publicado: 2024
Materias:
Acceso en línea:https://arxiv.org/abs/2411.07218
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866915014114279424
author D'Istria, Pierre Colonna
Altahhan, Abdulrahman
author_facet D'Istria, Pierre Colonna
Altahhan, Abdulrahman
contents In this paper, we introduce TreeCoders, a novel family of transformer trees. We moved away from traditional linear transformers to complete k-ary trees. Transformer blocks serve as nodes, and generic classifiers learn to select the best child and route the sequence of tokens to a specific leaf. The selectors, moved outside the transformer blocks, allow for the use of a variety of architecture without further modifications. Furthermore, our proposed architecture supports sparse node activation due to the logarithmic complexity of a tree search. We validate our idea by testing a series of decoder-only tree transformers, achieving competitive results across a diverse range of language datasets. Our study demonstrates that the proposed tree transformer model outperforms a size-equivalent linear transformer model 76\% of the time over a wide range of tree architectures. Furthermore, our proposed model naturally lends itself to distributed implementation.
format Preprint
id arxiv_https___arxiv_org_abs_2411_07218
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle TreeCoders: Trees of Transformers
D'Istria, Pierre Colonna
Altahhan, Abdulrahman
Computation and Language
Artificial Intelligence
In this paper, we introduce TreeCoders, a novel family of transformer trees. We moved away from traditional linear transformers to complete k-ary trees. Transformer blocks serve as nodes, and generic classifiers learn to select the best child and route the sequence of tokens to a specific leaf. The selectors, moved outside the transformer blocks, allow for the use of a variety of architecture without further modifications. Furthermore, our proposed architecture supports sparse node activation due to the logarithmic complexity of a tree search. We validate our idea by testing a series of decoder-only tree transformers, achieving competitive results across a diverse range of language datasets. Our study demonstrates that the proposed tree transformer model outperforms a size-equivalent linear transformer model 76\% of the time over a wide range of tree architectures. Furthermore, our proposed model naturally lends itself to distributed implementation.
title TreeCoders: Trees of Transformers
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2411.07218