Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jia, Zexi, Luo, Pengcheng, Zhong, Yijia, Zhang, Jinchao, Zhou, Jie
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.08064
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914387440173056
author	Jia, Zexi Luo, Pengcheng Zhong, Yijia Zhang, Jinchao Zhou, Jie
author_facet	Jia, Zexi Luo, Pengcheng Zhong, Yijia Zhang, Jinchao Zhou, Jie
contents	Most evaluations of generative models rely on feature-distribution metrics such as FID, which operate on continuous recognition features that are explicitly trained to be invariant to appearance variations, and thus discard cues critical for perceptual quality. We instead evaluate models in the space of discrete visual tokens, where modern 1D image tokenizers compactly encode both semantic and perceptual information and quality manifests as predictable token statistics. We introduce Codebook Histogram Distance (CHD), a training-free distribution metric in token space, and Code Mixture Model Score (CMMS), a no-reference quality metric learned from synthetic degradations of token sequences. To stress-test metrics under broad distribution shifts, we further propose VisForm, a benchmark of 210K images spanning 62 visual forms and 12 generative models with expert annotations. Across AGIQA, HPDv2/3, and VisForm, our token-based metrics achieve state-of-the-art correlation with human judgments. We will release all code and datasets to facilitate future research, with the code publicly available at https://github.com/zexiJia/1d-Distance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_08064
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Evaluating Generative Models via One-Dimensional Code Distributions Jia, Zexi Luo, Pengcheng Zhong, Yijia Zhang, Jinchao Zhou, Jie Computer Vision and Pattern Recognition Most evaluations of generative models rely on feature-distribution metrics such as FID, which operate on continuous recognition features that are explicitly trained to be invariant to appearance variations, and thus discard cues critical for perceptual quality. We instead evaluate models in the space of discrete visual tokens, where modern 1D image tokenizers compactly encode both semantic and perceptual information and quality manifests as predictable token statistics. We introduce Codebook Histogram Distance (CHD), a training-free distribution metric in token space, and Code Mixture Model Score (CMMS), a no-reference quality metric learned from synthetic degradations of token sequences. To stress-test metrics under broad distribution shifts, we further propose VisForm, a benchmark of 210K images spanning 62 visual forms and 12 generative models with expert annotations. Across AGIQA, HPDv2/3, and VisForm, our token-based metrics achieve state-of-the-art correlation with human judgments. We will release all code and datasets to facilitate future research, with the code publicly available at https://github.com/zexiJia/1d-Distance.
title	Evaluating Generative Models via One-Dimensional Code Distributions
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.08064

Similar Items