Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Qian, Wang, Wen, Zhang, Qinglin, Zheng, Siqi, Zhang, Shiliang, Deng, Chong, Yu, Hai, Liu, Jiaqing, Ma, Yukun, Zhang, Chong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2406.11274
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929388028690432
author	Chen, Qian Wang, Wen Zhang, Qinglin Zheng, Siqi Zhang, Shiliang Deng, Chong Yu, Hai Liu, Jiaqing Ma, Yukun Zhang, Chong
author_facet	Chen, Qian Wang, Wen Zhang, Qinglin Zheng, Siqi Zhang, Shiliang Deng, Chong Yu, Hai Liu, Jiaqing Ma, Yukun Zhang, Chong
contents	The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention between non-adjacent layers. This method improves the model's ability to capture dependencies between high-level abstract features and low-level details. By facilitating direct attention between these diverse feature levels, our approach overcomes the limitations of current Transformers, which often rely on suboptimal intra-layer attention. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer, thus enhancing the diversity of multi-head attention without additional computational burden. Extensive experiments demonstrate that our enhanced Transformer model achieves superior performance in language modeling tasks, highlighting the effectiveness of our skip-layer attention mechanism.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_11274
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers Chen, Qian Wang, Wen Zhang, Qinglin Zheng, Siqi Zhang, Shiliang Deng, Chong Yu, Hai Liu, Jiaqing Ma, Yukun Zhang, Chong Computation and Language The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention between non-adjacent layers. This method improves the model's ability to capture dependencies between high-level abstract features and low-level details. By facilitating direct attention between these diverse feature levels, our approach overcomes the limitations of current Transformers, which often rely on suboptimal intra-layer attention. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer, thus enhancing the diversity of multi-head attention without additional computational burden. Extensive experiments demonstrate that our enhanced Transformer model achieves superior performance in language modeling tasks, highlighting the effectiveness of our skip-layer attention mechanism.
title	Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
topic	Computation and Language
url	https://arxiv.org/abs/2406.11274

Similar Items