Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.08971 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913542039404544 |
|---|---|
| author | Lucas, Evan Kangas, Dylan Havens, Timothy C |
| author_facet | Lucas, Evan Kangas, Dylan Havens, Timothy C |
| contents | In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2410_08971 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures Lucas, Evan Kangas, Dylan Havens, Timothy C Computation and Language In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as connections between topics discussed at a beginning and end of a document. A method to selectively increase global attention is proposed and demonstrated for abstractive summarization tasks on several benchmark data sets. By prefixing the transcript with additional keywords and encoding global attention on these keywords, improvement in zero-shot, few-shot, and fine-tuned cases is demonstrated for some benchmark data sets. |
| title | Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2410.08971 |