Enregistré dans:
Détails bibliographiques
Auteurs principaux: Arnold, Stefan, Fietta, Marian, Yesilbas, Dilara
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2409.14107
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909322025369600
author Arnold, Stefan
Fietta, Marian
Yesilbas, Dilara
author_facet Arnold, Stefan
Fietta, Marian
Yesilbas, Dilara
contents Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.
format Preprint
id arxiv_https___arxiv_org_abs_2409_14107
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Routing in Sparsely-gated Language Models responds to Context
Arnold, Stefan
Fietta, Marian
Yesilbas, Dilara
Computation and Language
Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.
title Routing in Sparsely-gated Language Models responds to Context
topic Computation and Language
url https://arxiv.org/abs/2409.14107