Saved in:
Bibliographic Details
Main Authors: Gao, Xin, Xu, Xingming, Amiraslani, Shirin, Xu, Hong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.20096
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913976863948800
author Gao, Xin
Xu, Xingming
Amiraslani, Shirin
Xu, Hong
author_facet Gao, Xin
Xu, Xingming
Amiraslani, Shirin
Xu, Hong
contents The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy. (This version (v2) supersedes v1 and reflects the intended release and licensing.)
format Preprint
id arxiv_https___arxiv_org_abs_2507_20096
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle EcoTransformer: Attention without Multiplication
Gao, Xin
Xu, Xingming
Amiraslani, Shirin
Xu, Hong
Machine Learning
Artificial Intelligence
Computation and Language
68T05
The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy. (This version (v2) supersedes v1 and reflects the intended release and licensing.)
title EcoTransformer: Attention without Multiplication
topic Machine Learning
Artificial Intelligence
Computation and Language
68T05
url https://arxiv.org/abs/2507.20096