Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fang, Liancheng, Liu, Aiwei, Zou, Henry Peng, Chen, Yankai, Ma, Enze, Pan, Leyi, Miao, Chunyu, Huang, Wei-Chieh, Liu, Xue, Yu, Philip S.
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.00375
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915904408780800
author	Fang, Liancheng Liu, Aiwei Zou, Henry Peng Chen, Yankai Ma, Enze Pan, Leyi Miao, Chunyu Huang, Wei-Chieh Liu, Xue Yu, Philip S.
author_facet	Fang, Liancheng Liu, Aiwei Zou, Henry Peng Chen, Yankai Ma, Enze Pan, Leyi Miao, Chunyu Huang, Wei-Chieh Liu, Xue Yu, Philip S.
contents	Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_00375
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models Fang, Liancheng Liu, Aiwei Zou, Henry Peng Chen, Yankai Ma, Enze Pan, Leyi Miao, Chunyu Huang, Wei-Chieh Liu, Xue Yu, Philip S. Computation and Language Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple Independent Metropolis--Hastings sampler that approximately targets this distribution during decoding. Experiments across a range of reasoning benchmarks including MATH500, AIME24/25, HumanEval, and MBPP show that our approach yields better exploration-quality tradeoff than both random and low-confidence remasking.
title	Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2604.00375

Similar Items