Saved in:
Bibliographic Details
Main Authors: Lucas, Evan, Kangas, Dylan, Havens, Timothy C
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.08971
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913542039404544
author Lucas, Evan
Kangas, Dylan
Havens, Timothy C
author_facet Lucas, Evan
Kangas, Dylan
Havens, Timothy C
contents In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets.
format Preprint
id arxiv_https___arxiv_org_abs_2410_08971
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures
Lucas, Evan
Kangas, Dylan
Havens, Timothy C
Computation and Language
In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets.
title Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures
topic Computation and Language
url https://arxiv.org/abs/2410.08971