Saved in:
Bibliographic Details
Main Authors: Yang, Jiawei, Geng, Zhengyang, Ju, Xuan, Tian, Yonglong, Wang, Yue
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.28190
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911637089288192
author Yang, Jiawei
Geng, Zhengyang
Ju, Xuan
Tian, Yonglong
Wang, Yue
author_facet Yang, Jiawei
Geng, Zhengyang
Ju, Xuan
Tian, Yonglong
Wang, Yue
contents We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.
format Preprint
id arxiv_https___arxiv_org_abs_2604_28190
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Representation Fréchet Loss for Visual Generation
Yang, Jiawei
Geng, Zhengyang
Ju, Xuan
Tian, Yonglong
Wang, Yue
Computer Vision and Pattern Recognition
We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.
title Representation Fréchet Loss for Visual Generation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2604.28190