Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wicks, Rachel, Ravisankar, Kartik, Yang, Xinchen, Koehn, Philipp, Post, Matt
Format:	Preprint
Publié:	2025
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2502.21265
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866913712833560576
author	Wicks, Rachel Ravisankar, Kartik Yang, Xinchen Koehn, Philipp Post, Matt
author_facet	Wicks, Rachel Ravisankar, Kartik Yang, Xinchen Koehn, Philipp Post, Matt
contents	Model ensembling is a technique to combine the predicted distributions of two or more models, often leading to improved robustness and performance. For ensembling in text generation, the next token's probability distribution is derived from a weighted sum of the distributions of each individual model. This requires the underlying models to share the same subword vocabulary, limiting the applicability of ensembling, since many open-sourced models have distinct vocabularies. In research settings, experimentation or upgrades to vocabularies may introduce multiple vocabulary sizes. This paper proposes an inference-time only algorithm that allows for ensembling models with different vocabularies, without the need to learn additional parameters or alter the underlying models. Instead, the algorithm ensures that tokens generated by the ensembled models \textit{agree} in their surface form. We apply this technique to combinations of traditional encoder-decoder models and decoder-only LLMs and evaluate on machine translation. In addition to expanding to model pairs that were previously incapable of token-level ensembling, our algorithm frequently improves translation performance over either model individually.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_21265
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Token-level Ensembling of Models with Different Vocabularies Wicks, Rachel Ravisankar, Kartik Yang, Xinchen Koehn, Philipp Post, Matt Computation and Language Model ensembling is a technique to combine the predicted distributions of two or more models, often leading to improved robustness and performance. For ensembling in text generation, the next token's probability distribution is derived from a weighted sum of the distributions of each individual model. This requires the underlying models to share the same subword vocabulary, limiting the applicability of ensembling, since many open-sourced models have distinct vocabularies. In research settings, experimentation or upgrades to vocabularies may introduce multiple vocabulary sizes. This paper proposes an inference-time only algorithm that allows for ensembling models with different vocabularies, without the need to learn additional parameters or alter the underlying models. Instead, the algorithm ensures that tokens generated by the ensembled models \textit{agree} in their surface form. We apply this technique to combinations of traditional encoder-decoder models and decoder-only LLMs and evaluate on machine translation. In addition to expanding to model pairs that were previously incapable of token-level ensembling, our algorithm frequently improves translation performance over either model individually.
title	Token-level Ensembling of Models with Different Vocabularies
topic	Computation and Language
url	https://arxiv.org/abs/2502.21265

Documents similaires