Saved in:
Bibliographic Details
Main Authors: Xiang, Xunzhi, Chen, Yabo, Zhang, Guiyu, Wang, Zhongyu, Gao, Zhe, Xiang, Quanming, Shang, Gonghu, Liu, Junqi, Huang, Haibin, Gao, Yang, Zhang, Chi, Fan, Qi, Li, Xuelong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.03334
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911208065466368
author Xiang, Xunzhi
Chen, Yabo
Zhang, Guiyu
Wang, Zhongyu
Gao, Zhe
Xiang, Quanming
Shang, Gonghu
Liu, Junqi
Huang, Haibin
Gao, Yang
Zhang, Chi
Fan, Qi
Li, Xuelong
author_facet Xiang, Xunzhi
Chen, Yabo
Zhang, Guiyu
Wang, Zhongyu
Gao, Zhe
Xiang, Quanming
Shang, Gonghu
Liu, Junqi
Huang, Haibin
Gao, Yang
Zhang, Chi
Fan, Qi
Li, Xuelong
contents Current autoregressive diffusion models excel at video generation but are generally limited to short temporal durations. Our theoretical analysis indicates that the autoregressive modeling typically suffers from temporal drift caused by error accumulation and hinders parallelization in long video synthesis. To address these limitations, we propose a novel planning-then-populating framework centered on Macro-from-Micro Planning (MMPL) for long video generation. MMPL sketches a global storyline for the entire video through two hierarchical stages: Micro Planning and Macro Planning. Specifically, Micro Planning predicts a sparse set of future keyframes within each short video segment, offering motion and appearance priors to guide high-quality video segment generation. Macro Planning extends the in-segment keyframes planning across the entire video through an autoregressive chain of micro plans, ensuring long-term consistency across video segments. Subsequently, MMPL-based Content Populating generates all intermediate frames in parallel across segments, enabling efficient parallelization of autoregressive generation. The parallelization is further optimized by Adaptive Workload Scheduling for balanced GPU execution and accelerated autoregressive video generation. Extensive experiments confirm that our method outperforms existing long video generation models in quality and stability. Generated videos and comparison results are in our project page.
format Preprint
id arxiv_https___arxiv_org_abs_2508_03334
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation
Xiang, Xunzhi
Chen, Yabo
Zhang, Guiyu
Wang, Zhongyu
Gao, Zhe
Xiang, Quanming
Shang, Gonghu
Liu, Junqi
Huang, Haibin
Gao, Yang
Zhang, Chi
Fan, Qi
Li, Xuelong
Computer Vision and Pattern Recognition
Current autoregressive diffusion models excel at video generation but are generally limited to short temporal durations. Our theoretical analysis indicates that the autoregressive modeling typically suffers from temporal drift caused by error accumulation and hinders parallelization in long video synthesis. To address these limitations, we propose a novel planning-then-populating framework centered on Macro-from-Micro Planning (MMPL) for long video generation. MMPL sketches a global storyline for the entire video through two hierarchical stages: Micro Planning and Macro Planning. Specifically, Micro Planning predicts a sparse set of future keyframes within each short video segment, offering motion and appearance priors to guide high-quality video segment generation. Macro Planning extends the in-segment keyframes planning across the entire video through an autoregressive chain of micro plans, ensuring long-term consistency across video segments. Subsequently, MMPL-based Content Populating generates all intermediate frames in parallel across segments, enabling efficient parallelization of autoregressive generation. The parallelization is further optimized by Adaptive Workload Scheduling for balanced GPU execution and accelerated autoregressive video generation. Extensive experiments confirm that our method outperforms existing long video generation models in quality and stability. Generated videos and comparison results are in our project page.
title Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2508.03334