Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Jiawei, Geng, Zhengyang, Ju, Xuan, Tian, Yonglong, Wang, Yue
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.28190
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911637089288192
author	Yang, Jiawei Geng, Zhengyang Ju, Xuan Tian, Yonglong Wang, Yue
author_facet	Yang, Jiawei Geng, Zhengyang Ju, Xuan Tian, Yonglong Wang, Yue
contents	We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_28190
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Representation Fréchet Loss for Visual Generation Yang, Jiawei Geng, Zhengyang Ju, Xuan Tian, Yonglong Wang, Yue Computer Vision and Pattern Recognition We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.
title	Representation Fréchet Loss for Visual Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.28190

Similar Items