Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ai, Qihang, Jiang, Haiyun
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.20744
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918147884318720
author	Ai, Qihang Jiang, Haiyun
author_facet	Ai, Qihang Jiang, Haiyun
contents	We study reasoning tasks through a framework that integrates auto-regressive (AR) and non-autoregressive (NAR) language models. AR models, which generate text sequentially, excel at producing coherent outputs but often suffer from slow inference, particularly in reasoning-intensive domains such as mathematics and code, where lengthy chains of thought are required. In contrast, NAR models, such as discrete diffusion models, allow parallel generation and offer substantial speedups, though typically at the cost of reduced output quality. To address these limitations, we introduce a new paradigm in which an NAR model efficiently produces intermediate reasoning traces, which subsequently guide an AR model to deliver precise final answers. Experiments demonstrate that our approach yields significant 26% improvements over strong baselines while substantially reducing inference cost.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_20744
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning Ai, Qihang Jiang, Haiyun Artificial Intelligence We study reasoning tasks through a framework that integrates auto-regressive (AR) and non-autoregressive (NAR) language models. AR models, which generate text sequentially, excel at producing coherent outputs but often suffer from slow inference, particularly in reasoning-intensive domains such as mathematics and code, where lengthy chains of thought are required. In contrast, NAR models, such as discrete diffusion models, allow parallel generation and offer substantial speedups, though typically at the cost of reduced output quality. To address these limitations, we introduce a new paradigm in which an NAR model efficiently produces intermediate reasoning traces, which subsequently guide an AR model to deliver precise final answers. Experiments demonstrate that our approach yields significant 26% improvements over strong baselines while substantially reducing inference cost.
title	Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2509.20744

Similar Items