Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Yu, Zhengyang, Hayakawa, Akio, Ishii, Masato, Yu, Qingtao, Shibuya, Takashi, Zhang, Jing, Mitsufuji, Yuki
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2512.11203
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866917146433421312
author	Yu, Zhengyang Hayakawa, Akio Ishii, Masato Yu, Qingtao Shibuya, Takashi Zhang, Jing Mitsufuji, Yuki
author_facet	Yu, Zhengyang Hayakawa, Akio Ishii, Masato Yu, Qingtao Shibuya, Takashi Zhang, Jing Mitsufuji, Yuki
contents	Autoregressive video diffusion models (AR-VDMs) show strong promise as scalable alternatives to bidirectional VDMs, enabling real-time and interactive applications. Yet there remains room for improvement in their sample fidelity. A promising solution is inference-time alignment, which optimizes the noise space to improve sample fidelity without updating model parameters. Yet, optimization- or search-based methods are computationally impractical for AR-VDMs. Recent text-to-image (T2I) works address this via feedforward noise refiners that modulate sampled noises in a single forward pass. Can such noise refiners be extended to AR-VDMs? We identify the failure of naively extending T2I noise refiners to AR-VDMs and propose AutoRefiner-a noise refiner tailored for AR-VDMs, with two key designs: pathwise noise refinement and a reflective KV-cache. Experiments demonstrate that AutoRefiner serves as an efficient plug-in for AR-VDMs, effectively enhancing sample fidelity by refining noise along stochastic denoising paths.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_11203
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path Yu, Zhengyang Hayakawa, Akio Ishii, Masato Yu, Qingtao Shibuya, Takashi Zhang, Jing Mitsufuji, Yuki Computer Vision and Pattern Recognition Autoregressive video diffusion models (AR-VDMs) show strong promise as scalable alternatives to bidirectional VDMs, enabling real-time and interactive applications. Yet there remains room for improvement in their sample fidelity. A promising solution is inference-time alignment, which optimizes the noise space to improve sample fidelity without updating model parameters. Yet, optimization- or search-based methods are computationally impractical for AR-VDMs. Recent text-to-image (T2I) works address this via feedforward noise refiners that modulate sampled noises in a single forward pass. Can such noise refiners be extended to AR-VDMs? We identify the failure of naively extending T2I noise refiners to AR-VDMs and propose AutoRefiner-a noise refiner tailored for AR-VDMs, with two key designs: pathwise noise refinement and a reflective KV-cache. Experiments demonstrate that AutoRefiner serves as an efficient plug-in for AR-VDMs, effectively enhancing sample fidelity by refining noise along stochastic denoising paths.
title	AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.11203

Documents similaires