Tabla de Contenidos: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Cui, Hanshuai, Tang, Zhiqing, Yao, Zhi, Meng, Fanshuai, Jia, Weijia, Zhao, Wei
Formato:	Preprint
Publicado:	2026
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2604.02979
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Tabla de Contenidos:

Autoregressive (AR) video diffusion models enable long-form video generation but remain expensive due to repeated multi-step denoising. Existing training-free acceleration methods rely on binary cache-or-recompute decisions, overlooking intermediate cases where direct reuse is too coarse yet full recomputation is unnecessary. Moreover, asynchronous AR schedules assign different noise levels to co-generated frames, yet existing methods process the entire valid interval uniformly. To address these AR-specific inefficiencies, we present SCOPE, a training-free framework for efficient AR video diffusion. SCOPE introduces a tri-modal scheduler over cache, predict, and recompute, where prediction via noise-level Taylor extrapolation fills the gap between reuse and recomputation with explicit stability controls backed by error propagation analysis. It further introduces selective computation that restricts execution to the active frame interval. On MAGI-1 and SkyReels-V2, SCOPE achieves up to 4.73x speedup while maintaining quality comparable to the original output, outperforming all training-free baselines.

Ejemplares similares