Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.20096 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913976863948800 |
|---|---|
| author | Gao, Xin Xu, Xingming Amiraslani, Shirin Xu, Hong |
| author_facet | Gao, Xin Xu, Xingming Amiraslani, Shirin Xu, Hong |
| contents | The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy.
(This version (v2) supersedes v1 and reflects the intended release and licensing.) |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2507_20096 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | EcoTransformer: Attention without Multiplication Gao, Xin Xu, Xingming Amiraslani, Shirin Xu, Hong Machine Learning Artificial Intelligence Computation and Language 68T05 The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy. (This version (v2) supersedes v1 and reflects the intended release and licensing.) |
| title | EcoTransformer: Attention without Multiplication |
| topic | Machine Learning Artificial Intelligence Computation and Language 68T05 |
| url | https://arxiv.org/abs/2507.20096 |