Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gao, Xin, Xu, Xingming, Amiraslani, Shirin, Xu, Hong
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language 68T05
Online Access:	https://arxiv.org/abs/2507.20096
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913976863948800
author	Gao, Xin Xu, Xingming Amiraslani, Shirin Xu, Hong
author_facet	Gao, Xin Xu, Xingming Amiraslani, Shirin Xu, Hong
contents	The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy. (This version (v2) supersedes v1 and reflects the intended release and licensing.)
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_20096
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	EcoTransformer: Attention without Multiplication Gao, Xin Xu, Xingming Amiraslani, Shirin Xu, Hong Machine Learning Artificial Intelligence Computation and Language 68T05 The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy. (This version (v2) supersedes v1 and reflects the intended release and licensing.)
title	EcoTransformer: Attention without Multiplication
topic	Machine Learning Artificial Intelligence Computation and Language 68T05
url	https://arxiv.org/abs/2507.20096

Similar Items