Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Davoodi, Arash Gholami, Rezazadeh, Navid, Davoudi, Seyed Pouyan Mousavi, Pezeshkpour, Pouya
Format:	Preprint
Publié:	2026
Sujets:	Computation and Language Machine Learning
Accès en ligne:	https://arxiv.org/abs/2602.10346
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866914563966894080
author	Davoodi, Arash Gholami Rezazadeh, Navid Davoudi, Seyed Pouyan Mousavi Pezeshkpour, Pouya
author_facet	Davoodi, Arash Gholami Rezazadeh, Navid Davoudi, Seyed Pouyan Mousavi Pezeshkpour, Pouya
contents	Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic, relying mainly on probability mass and entropy while ignoring semantic geometry of the token space. We present Top-W, a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry-to keep the cropped distribution close to the original, while explicitly balancing retained probability mass against the entropy of the kept set. Our theory yields a simple closed-form structure for the fixed-potential subset update: depending on the mass-entropy trade-off, the optimal crop either collapses to a single token or takes the form of a one-dimensional prefix that can be found efficiently with a linear scan. We implement Top-W using efficient geometry-based potentials (nearest-set or k-NN) and pair it with an alternating decoding routine that keeps the standard truncation-and-sampling interface unchanged. Extensive experiments on four benchmarks (GSM8K, GPQA, AlpacaEval, and MT-Bench) across three instruction-tuned models show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement. Moreover, we find that Top-W not only improves accuracy-focused performance, but also boosts creativity under judge-based open-ended evaluation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_10346
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models Davoodi, Arash Gholami Rezazadeh, Navid Davoudi, Seyed Pouyan Mousavi Pezeshkpour, Pouya Computation and Language Machine Learning Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic, relying mainly on probability mass and entropy while ignoring semantic geometry of the token space. We present Top-W, a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry-to keep the cropped distribution close to the original, while explicitly balancing retained probability mass against the entropy of the kept set. Our theory yields a simple closed-form structure for the fixed-potential subset update: depending on the mass-entropy trade-off, the optimal crop either collapses to a single token or takes the form of a one-dimensional prefix that can be found efficiently with a linear scan. We implement Top-W using efficient geometry-based potentials (nearest-set or k-NN) and pair it with an alternating decoding routine that keeps the standard truncation-and-sampling interface unchanged. Extensive experiments on four benchmarks (GSM8K, GPQA, AlpacaEval, and MT-Bench) across three instruction-tuned models show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement. Moreover, we find that Top-W not only improves accuracy-focused performance, but also boosts creativity under judge-based open-ended evaluation.
title	Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2602.10346

Documents similaires