Saved in:
Bibliographic Details
Main Authors: Vardhana, Korada Sri, Lolla, Shrikrishna, Biswas, Soma
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.03749
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909941902606336
author Vardhana, Korada Sri
Lolla, Shrikrishna
Biswas, Soma
author_facet Vardhana, Korada Sri
Lolla, Shrikrishna
Biswas, Soma
contents Text-to-image (T2I) diffusion models have achieved widespread success due to their ability to generate high-resolution, photorealistic images. These models are trained on large-scale datasets, like LAION-5B, often scraped from the internet. However, since this data contains numerous biases, the models inherently learn and reproduce them, resulting in stereotypical outputs. We introduce SelfDebias, a fully unsupervised test-time debiasing method applicable to any diffusion model that uses a UNet as its noise predictor. SelfDebias identifies semantic clusters in an image encoder's embedding space and uses these clusters to guide the diffusion process during inference, minimizing the KL divergence between the output distribution and the uniform distribution. Unlike supervised approaches, SelfDebias does not require human-annotated datasets or external classifiers trained for each generated concept. Instead, it is designed to automatically identify semantic modes. Extensive experiments show that SelfDebias generalizes across prompts and diffusion model architectures, including both conditional and unconditional models. It not only effectively debiases images along key demographic dimensions while maintaining the visual fidelity of the generated images, but also more abstract concepts for which identifying biases is also challenging.
format Preprint
id arxiv_https___arxiv_org_abs_2512_03749
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models
Vardhana, Korada Sri
Lolla, Shrikrishna
Biswas, Soma
Computer Vision and Pattern Recognition
Text-to-image (T2I) diffusion models have achieved widespread success due to their ability to generate high-resolution, photorealistic images. These models are trained on large-scale datasets, like LAION-5B, often scraped from the internet. However, since this data contains numerous biases, the models inherently learn and reproduce them, resulting in stereotypical outputs. We introduce SelfDebias, a fully unsupervised test-time debiasing method applicable to any diffusion model that uses a UNet as its noise predictor. SelfDebias identifies semantic clusters in an image encoder's embedding space and uses these clusters to guide the diffusion process during inference, minimizing the KL divergence between the output distribution and the uniform distribution. Unlike supervised approaches, SelfDebias does not require human-annotated datasets or external classifiers trained for each generated concept. Instead, it is designed to automatically identify semantic modes. Extensive experiments show that SelfDebias generalizes across prompts and diffusion model architectures, including both conditional and unconditional models. It not only effectively debiases images along key demographic dimensions while maintaining the visual fidelity of the generated images, but also more abstract concepts for which identifying biases is also challenging.
title Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2512.03749