Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Chenxiao, Zhou, Cai, Wipf, David, Li, Zhiyuan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.06190
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914159685271552
author	Yang, Chenxiao Zhou, Cai Wipf, David Li, Zhiyuan
author_facet	Yang, Chenxiao Zhou, Cai Wipf, David Li, Zhiyuan
contents	Diffusion language models have recently emerged as a competitive alternative to autoregressive language models. Beyond next-token generation, they are more efficient and flexible by enabling parallel and any-order token generation. However, despite empirical successes, their computational power and fundamental limitations remain poorly understood. In this paper, we formally study whether non-autoregressive generation in Masked Diffusion Models (MDM) enables solving problems beyond the reach of Auto-Regressive Models (ARM). Our results show that MDM with sufficiently large context length is computationally universal with decoding steps matching the optimal parallel time complexity in PRAM. However, when controlling for other factors, MDM's flexibility to generate in any-order does not expand what ARM can already solve. To address this, we propose a new form of generation called any-process generation, which extends MDM with capabilities to remask, insert and delete tokens, allowing self-correction, length-variable editing, and adaptive parallelism. Theoretically and empirically, we demonstrate these capabilities enable scalability to significantly harder reasoning problems that are otherwise intractable for ARM and vanilla MDM. Additionally, they prove essential for generation tasks where objects naturally evolve through non-sequential processes, crucial for extending current LLMs beyond natural language to domains such as coding and science.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_06190
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond Yang, Chenxiao Zhou, Cai Wipf, David Li, Zhiyuan Machine Learning Diffusion language models have recently emerged as a competitive alternative to autoregressive language models. Beyond next-token generation, they are more efficient and flexible by enabling parallel and any-order token generation. However, despite empirical successes, their computational power and fundamental limitations remain poorly understood. In this paper, we formally study whether non-autoregressive generation in Masked Diffusion Models (MDM) enables solving problems beyond the reach of Auto-Regressive Models (ARM). Our results show that MDM with sufficiently large context length is computationally universal with decoding steps matching the optimal parallel time complexity in PRAM. However, when controlling for other factors, MDM's flexibility to generate in any-order does not expand what ARM can already solve. To address this, we propose a new form of generation called any-process generation, which extends MDM with capabilities to remask, insert and delete tokens, allowing self-correction, length-variable editing, and adaptive parallelism. Theoretically and empirically, we demonstrate these capabilities enable scalability to significantly harder reasoning problems that are otherwise intractable for ARM and vanilla MDM. Additionally, they prove essential for generation tasks where objects naturally evolve through non-sequential processes, crucial for extending current LLMs beyond natural language to domains such as coding and science.
title	On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
topic	Machine Learning
url	https://arxiv.org/abs/2510.06190

Similar Items