Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yue, Zhixiong, Ni, Zixuan, Ye, Feiyang, Zhang, Jinshan, Shen, Sheng, Mi, Zhenpeng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01591
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912868163649536
author	Yue, Zhixiong Ni, Zixuan Ye, Feiyang Zhang, Jinshan Shen, Sheng Mi, Zhenpeng
author_facet	Yue, Zhixiong Ni, Zixuan Ye, Feiyang Zhang, Jinshan Shen, Sheng Mi, Zhenpeng
contents	Recent advances in flow matching models, particularly with reinforcement learning (RL), have significantly enhanced human preference alignment in few step text to image generators. However, existing RL based approaches for flow matching models typically rely on numerous denoising steps, while suffering from sparse and imprecise reward signals that often lead to suboptimal alignment. To address these limitations, we propose Temperature Annealed Few step Sampling with Group Relative Policy Optimization (TAFS GRPO), a novel framework for training flow matching text to image models into efficient few step generators well aligned with human preferences. Our method iteratively injects adaptive temporal noise onto the results of one step samples. By repeatedly annealing the model's sampled outputs, it introduces stochasticity into the sampling process while preserving the semantic integrity of each generated image. Moreover, its step aware advantage integration mechanism combines the GRPO to avoid the need for the differentiable of reward function and provide dense and step specific rewards for stable policy optimization. Extensive experiments demonstrate that TAFS GRPO achieves strong performance in few step text to image generation and significantly improves the alignment of generated images with human preferences. The code and models of this work will be available to facilitate further research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01591
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages Yue, Zhixiong Ni, Zixuan Ye, Feiyang Zhang, Jinshan Shen, Sheng Mi, Zhenpeng Computer Vision and Pattern Recognition Recent advances in flow matching models, particularly with reinforcement learning (RL), have significantly enhanced human preference alignment in few step text to image generators. However, existing RL based approaches for flow matching models typically rely on numerous denoising steps, while suffering from sparse and imprecise reward signals that often lead to suboptimal alignment. To address these limitations, we propose Temperature Annealed Few step Sampling with Group Relative Policy Optimization (TAFS GRPO), a novel framework for training flow matching text to image models into efficient few step generators well aligned with human preferences. Our method iteratively injects adaptive temporal noise onto the results of one step samples. By repeatedly annealing the model's sampled outputs, it introduces stochasticity into the sampling process while preserving the semantic integrity of each generated image. Moreover, its step aware advantage integration mechanism combines the GRPO to avoid the need for the differentiable of reward function and provide dense and step specific rewards for stable policy optimization. Extensive experiments demonstrate that TAFS GRPO achieves strong performance in few step text to image generation and significantly improves the alignment of generated images with human preferences. The code and models of this work will be available to facilitate further research.
title	Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.01591

Similar Items