Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sanford, Clayton, Hsu, Daniel, Telgarsky, Matus
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.09268
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers.

Similar Items