Saved in:
Bibliographic Details
Main Authors: Yang, Chenxiao, Zhou, Cai, Wipf, David, Li, Zhiyuan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.06190
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914159685271552
author Yang, Chenxiao
Zhou, Cai
Wipf, David
Li, Zhiyuan
author_facet Yang, Chenxiao
Zhou, Cai
Wipf, David
Li, Zhiyuan
contents Diffusion language models have recently emerged as a competitive alternative to autoregressive language models. Beyond next-token generation, they are more efficient and flexible by enabling parallel and any-order token generation. However, despite empirical successes, their computational power and fundamental limitations remain poorly understood. In this paper, we formally study whether non-autoregressive generation in Masked Diffusion Models (MDM) enables solving problems beyond the reach of Auto-Regressive Models (ARM). Our results show that MDM with sufficiently large context length is computationally universal with decoding steps matching the optimal parallel time complexity in PRAM. However, when controlling for other factors, MDM's flexibility to generate in any-order does not expand what ARM can already solve. To address this, we propose a new form of generation called any-process generation, which extends MDM with capabilities to remask, insert and delete tokens, allowing self-correction, length-variable editing, and adaptive parallelism. Theoretically and empirically, we demonstrate these capabilities enable scalability to significantly harder reasoning problems that are otherwise intractable for ARM and vanilla MDM. Additionally, they prove essential for generation tasks where objects naturally evolve through non-sequential processes, crucial for extending current LLMs beyond natural language to domains such as coding and science.
format Preprint
id arxiv_https___arxiv_org_abs_2510_06190
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
Yang, Chenxiao
Zhou, Cai
Wipf, David
Li, Zhiyuan
Machine Learning
Diffusion language models have recently emerged as a competitive alternative to autoregressive language models. Beyond next-token generation, they are more efficient and flexible by enabling parallel and any-order token generation. However, despite empirical successes, their computational power and fundamental limitations remain poorly understood. In this paper, we formally study whether non-autoregressive generation in Masked Diffusion Models (MDM) enables solving problems beyond the reach of Auto-Regressive Models (ARM). Our results show that MDM with sufficiently large context length is computationally universal with decoding steps matching the optimal parallel time complexity in PRAM. However, when controlling for other factors, MDM's flexibility to generate in any-order does not expand what ARM can already solve. To address this, we propose a new form of generation called any-process generation, which extends MDM with capabilities to remask, insert and delete tokens, allowing self-correction, length-variable editing, and adaptive parallelism. Theoretically and empirically, we demonstrate these capabilities enable scalability to significantly harder reasoning problems that are otherwise intractable for ARM and vanilla MDM. Additionally, they prove essential for generation tasks where objects naturally evolve through non-sequential processes, crucial for extending current LLMs beyond natural language to domains such as coding and science.
title On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
topic Machine Learning
url https://arxiv.org/abs/2510.06190