Saved in:
Bibliographic Details
Main Authors: Huang, Zhihao, Qiu, Xi, Ma, Yukuo, Zhou, Yifu, Chen, Junjie, Zhang, Hongyuan, Zhang, Chi, Li, Xuelong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.07076
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Autoregressive models have achieved significant success in image generation. However, unlike the inherent hierarchical structure of image information in the spectral domain, standard autoregressive methods typically generate pixels sequentially in a fixed spatial order. To better leverage this spectral hierarchy, we introduce NextFrequency Image Generation (NFIG). NFIG is a novel framework that decomposes the image generation process into multiple frequency-guided stages. NFIG aligns the generation process with the natural image structure. It does this by first generating low-frequency components, which efficiently capture global structure with significantly fewer tokens, and then progressively adding higher-frequency details. This frequency-aware paradigm offers substantial advantages: it not only improves the quality of generated images but crucially reduces inference cost by efficiently establishing global structure early on. Extensive experiments on the ImageNet-256 benchmark validate NFIG's effectiveness, demonstrating superior performance (FID: 2.81) and a notable 1.25x speedup compared to the strong baseline VAR-d20. The source code is available at https://github.com/Pride-Huang/NFIG.