Saved in:
Bibliographic Details
Main Authors: Yuan, Mu, Zeng, Liekang, Xing, Guoliang, Zhang, Lan, Liu, Yunhao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.01608
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917241212108800
author Yuan, Mu
Zeng, Liekang
Xing, Guoliang
Zhang, Lan
Liu, Yunhao
author_facet Yuan, Mu
Zeng, Liekang
Xing, Guoliang
Zhang, Lan
Liu, Yunhao
contents Autoregressive and diffusion models represent two complementary generative paradigms. Autoregressive models excel at sequential planning and constraint composition, yet struggle with tasks that require explicit spatial or physical grounding. Diffusion models, in contrast, capture rich spatial structure through high-dimensional generation, but lack the stepwise logical control needed to satisfy complex, multi-stage constraints or to reliably identify and correct errors. We introduce Collaborative Thoughts, a unified collaborative framework that enables autoregressive and diffusion models to reason and generate jointly through a closed-loop interaction. In Collaborative Thoughts, autoregressive models perform structured planning and constraint management, diffusion models instantiate these constraints as intermediate visual thoughts, and a vision-based critic module evaluates whether the visual thoughts satisfy the intended structural and physical requirements. This feedback is then used to iteratively refine subsequent planning and generation steps, mitigating error propagation across modalities. Importantly, Collaborative Thoughts uses the same collaborative loop regardless of whether the task is autoregressive question answering or diffusion-based visual generation. Through representative examples, we illustrate how Collaborative Thoughts can improve the reliability of spatial reasoning and the controllability of generation.
format Preprint
id arxiv_https___arxiv_org_abs_2602_01608
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Reasoning with Autoregressive-Diffusion Collaborative Thoughts
Yuan, Mu
Zeng, Liekang
Xing, Guoliang
Zhang, Lan
Liu, Yunhao
Artificial Intelligence
Autoregressive and diffusion models represent two complementary generative paradigms. Autoregressive models excel at sequential planning and constraint composition, yet struggle with tasks that require explicit spatial or physical grounding. Diffusion models, in contrast, capture rich spatial structure through high-dimensional generation, but lack the stepwise logical control needed to satisfy complex, multi-stage constraints or to reliably identify and correct errors. We introduce Collaborative Thoughts, a unified collaborative framework that enables autoregressive and diffusion models to reason and generate jointly through a closed-loop interaction. In Collaborative Thoughts, autoregressive models perform structured planning and constraint management, diffusion models instantiate these constraints as intermediate visual thoughts, and a vision-based critic module evaluates whether the visual thoughts satisfy the intended structural and physical requirements. This feedback is then used to iteratively refine subsequent planning and generation steps, mitigating error propagation across modalities. Importantly, Collaborative Thoughts uses the same collaborative loop regardless of whether the task is autoregressive question answering or diffusion-based visual generation. Through representative examples, we illustrate how Collaborative Thoughts can improve the reliability of spatial reasoning and the controllability of generation.
title Reasoning with Autoregressive-Diffusion Collaborative Thoughts
topic Artificial Intelligence
url https://arxiv.org/abs/2602.01608