Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Arnold, Stefan, Fietta, Marian, Yesilbas, Dilara
Format:	Preprint
Publié:	2024
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2409.14107
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909322025369600
author	Arnold, Stefan Fietta, Marian Yesilbas, Dilara
author_facet	Arnold, Stefan Fietta, Marian Yesilbas, Dilara
contents	Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_14107
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Routing in Sparsely-gated Language Models responds to Context Arnold, Stefan Fietta, Marian Yesilbas, Dilara Computation and Language Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.
title	Routing in Sparsely-gated Language Models responds to Context
topic	Computation and Language
url	https://arxiv.org/abs/2409.14107

Documents similaires