Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dunlap, Lisa, Gonzalez, Joseph E., Darrell, Trevor, Heilbron, Fabian Caba, Sivic, Josef, Russell, Bryan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.08940
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908531607732224
author	Dunlap, Lisa Gonzalez, Joseph E. Darrell, Trevor Heilbron, Fabian Caba Sivic, Josef Russell, Bryan
author_facet	Dunlap, Lisa Gonzalez, Joseph E. Darrell, Trevor Heilbron, Fabian Caba Sivic, Josef Russell, Bryan
contents	In this paper, we investigate when and how visual representations learned by two different generative models diverge. Given two text-to-image models, our goal is to discover visual attributes that appear in images generated by one model but not the other, along with the types of prompts that trigger these attribute differences. For example, "flames" might appear in one model's outputs when given prompts expressing strong emotions, while the other model does not produce this attribute given the same prompts. We introduce CompCon (Comparing Concepts), an evolutionary search algorithm that discovers visual attributes more prevalent in one model's output than the other, and uncovers the prompt concepts linked to these visual differences. To evaluate CompCon's ability to find diverging representations, we create an automated data generation pipeline to produce ID2, a dataset of 60 input-dependent differences, and compare our approach to several LLM- and VLM-powered baselines. Finally, we use CompCon to compare popular text-to-image models, finding divergent representations such as how PixArt depicts prompts mentioning loneliness with wet streets and Stable Diffusion 3.5 depicts African American people in media professions. Code at: https://github.com/adobe-research/CompCon
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_08940
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Discovering Divergent Representations between Text-to-Image Models Dunlap, Lisa Gonzalez, Joseph E. Darrell, Trevor Heilbron, Fabian Caba Sivic, Josef Russell, Bryan Computer Vision and Pattern Recognition In this paper, we investigate when and how visual representations learned by two different generative models diverge. Given two text-to-image models, our goal is to discover visual attributes that appear in images generated by one model but not the other, along with the types of prompts that trigger these attribute differences. For example, "flames" might appear in one model's outputs when given prompts expressing strong emotions, while the other model does not produce this attribute given the same prompts. We introduce CompCon (Comparing Concepts), an evolutionary search algorithm that discovers visual attributes more prevalent in one model's output than the other, and uncovers the prompt concepts linked to these visual differences. To evaluate CompCon's ability to find diverging representations, we create an automated data generation pipeline to produce ID2, a dataset of 60 input-dependent differences, and compare our approach to several LLM- and VLM-powered baselines. Finally, we use CompCon to compare popular text-to-image models, finding divergent representations such as how PixArt depicts prompts mentioning loneliness with wet streets and Stable Diffusion 3.5 depicts African American people in media professions. Code at: https://github.com/adobe-research/CompCon
title	Discovering Divergent Representations between Text-to-Image Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2509.08940

Similar Items