Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Diera, Andor, Galke, Lukas, Karl, Fabian, Scherp, Ansgar
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.08528
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908872451555328
author	Diera, Andor Galke, Lukas Karl, Fabian Scherp, Ansgar
author_facet	Diera, Andor Galke, Lukas Karl, Fabian Scherp, Ansgar
contents	Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs. Furthermore, we show that DKVB remains effective even in challenging single-head continual learning scenarios where no task ID is provided.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_08528
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck Diera, Andor Galke, Lukas Karl, Fabian Scherp, Ansgar Computation and Language Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs. Furthermore, we show that DKVB remains effective even in challenging single-head continual learning scenarios where no task ID is provided.
title	Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck
topic	Computation and Language
url	https://arxiv.org/abs/2412.08528

Similar Items