Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ortega, Jorge Chang, Lan, Bastien Le, Serre, Thomas, Boutin, Victor
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.23819
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913156994957312
author	Ortega, Jorge Chang Lan, Bastien Le Serre, Thomas Boutin, Victor
author_facet	Ortega, Jorge Chang Lan, Bastien Le Serre, Thomas Boutin, Victor
contents	A central question in computational vision is whether human-like visual representations are better explained by discriminative or generative learning. Existing comparisons, however, often confound the learning objective with architecture, scale, and training data, leaving open whether the objective itself drives alignment. We address this confound using Joint Energy-Based Models (JEMs), which interpolate continuously between discriminative and generative training within a fixed architecture. By varying a single mixing coefficient, we isolate the effect of the learning objective and evaluate the resulting models across six human-alignment benchmarks spanning perceptual similarity, gloss perception, human response uncertainty, robustness, shape-texture cue conflict, and diagnostic feature attribution. Across this diverse suite, human alignment is consistently maximized at intermediate points of the generative-discriminative continuum, rather than at either endpoint. Hybrid JEMs combine the categorical structure induced by discriminative learning with the sensitivity to input structure induced by generative learning, yielding more human-like behavior across multiple levels of vision. These results suggest that the generative-discriminative dichotomy is the wrong axis for understanding human-aligned vision: alignment emerges not from choosing one objective over the other, but from balancing both.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_23819
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot Ortega, Jorge Chang Lan, Bastien Le Serre, Thomas Boutin, Victor Computer Vision and Pattern Recognition Artificial Intelligence A central question in computational vision is whether human-like visual representations are better explained by discriminative or generative learning. Existing comparisons, however, often confound the learning objective with architecture, scale, and training data, leaving open whether the objective itself drives alignment. We address this confound using Joint Energy-Based Models (JEMs), which interpolate continuously between discriminative and generative training within a fixed architecture. By varying a single mixing coefficient, we isolate the effect of the learning objective and evaluate the resulting models across six human-alignment benchmarks spanning perceptual similarity, gloss perception, human response uncertainty, robustness, shape-texture cue conflict, and diagnostic feature attribution. Across this diverse suite, human alignment is consistently maximized at intermediate points of the generative-discriminative continuum, rather than at either endpoint. Hybrid JEMs combine the categorical structure induced by discriminative learning with the sensitivity to input structure induced by generative learning, yielding more human-like behavior across multiple levels of vision. These results suggest that the generative-discriminative dichotomy is the wrong axis for understanding human-aligned vision: alignment emerges not from choosing one objective over the other, but from balancing both.
title	Not Too Generative, Not Too Discriminative: The Human Alignment Sweet Spot
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2605.23819

Similar Items