Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Xue, Zhou, Tian, Zhu, Jianqing, Liu, Jialin, Yuan, Kun, Yao, Tao, Yin, Wotao, Jin, Rong, Cai, HanQin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computer Vision and Pattern Recognition Image and Video Processing
Online Access:	https://arxiv.org/abs/2408.08567
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929502153605120
author	Wang, Xue Zhou, Tian Zhu, Jianqing Liu, Jialin Yuan, Kun Yao, Tao Yin, Wotao Jin, Rong Cai, HanQin
author_facet	Wang, Xue Zhou, Tian Zhu, Jianqing Liu, Jialin Yuan, Kun Yao, Tao Yin, Wotao Jin, Rong Cai, HanQin
contents	Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined S$^3$Attention, which significantly improves upon the previous attempts to negotiate this trade-off. S$^3$Attention has two mechanisms to effectively minimize the impact of noise while keeping the linear complexity to the sequence length: a smoothing block to mix information over long sequences and a matrix sketching method that simultaneously selects columns and rows from the input matrix. We verify the effectiveness of S$^3$Attention both theoretically and empirically. Extensive studies over Long Range Arena (LRA) datasets and six time-series forecasting show that S$^3$Attention significantly outperforms both vanilla Attention and other state-of-the-art variants of Attention structures.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_08567
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching Wang, Xue Zhou, Tian Zhu, Jianqing Liu, Jialin Yuan, Kun Yao, Tao Yin, Wotao Jin, Rong Cai, HanQin Machine Learning Computer Vision and Pattern Recognition Image and Video Processing Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined S$^3$Attention, which significantly improves upon the previous attempts to negotiate this trade-off. S$^3$Attention has two mechanisms to effectively minimize the impact of noise while keeping the linear complexity to the sequence length: a smoothing block to mix information over long sequences and a matrix sketching method that simultaneously selects columns and rows from the input matrix. We verify the effectiveness of S$^3$Attention both theoretically and empirically. Extensive studies over Long Range Arena (LRA) datasets and six time-series forecasting show that S$^3$Attention significantly outperforms both vanilla Attention and other state-of-the-art variants of Attention structures.
title	S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching
topic	Machine Learning Computer Vision and Pattern Recognition Image and Video Processing
url	https://arxiv.org/abs/2408.08567

Similar Items