Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Junyan, Sun, Zhenhong, Tan, Zhiyu, Chen, Xuanbai, Chen, Weihua, Li, Hao, Zhang, Cheng, Song, Yang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2403.05239
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929268661944320
author	Wang, Junyan Sun, Zhenhong Tan, Zhiyu Chen, Xuanbai Chen, Weihua Li, Hao Zhang, Cheng Song, Yang
author_facet	Wang, Junyan Sun, Zhenhong Tan, Zhiyu Chen, Xuanbai Chen, Weihua Li, Hao Zhang, Cheng Song, Yang
contents	Vanilla text-to-image diffusion models struggle with generating accurate human images, commonly resulting in imperfect anatomies such as unnatural postures or disproportionate limbs.Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls -- human-centric priors such as pose or depth maps -- during the image generation phase. This paper explores the integration of these human-centric priors directly into the model fine-tuning stage, essentially eliminating the need for extra conditions at the inference stage. We realize this idea by proposing a human-centric alignment loss to strengthen human-related information from the textual prompts within the cross-attention maps. To ensure semantic detail richness and human structural accuracy during fine-tuning, we introduce scale-aware and step-wise constraints within the diffusion process, according to an in-depth analysis of the cross-attention layer. Extensive experiments show that our method largely improves over state-of-the-art text-to-image models to synthesize high-quality human images based on user-written prompts. Project page: \url{https://hcplayercvpr2024.github.io}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_05239
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation Wang, Junyan Sun, Zhenhong Tan, Zhiyu Chen, Xuanbai Chen, Weihua Li, Hao Zhang, Cheng Song, Yang Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Vanilla text-to-image diffusion models struggle with generating accurate human images, commonly resulting in imperfect anatomies such as unnatural postures or disproportionate limbs.Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls -- human-centric priors such as pose or depth maps -- during the image generation phase. This paper explores the integration of these human-centric priors directly into the model fine-tuning stage, essentially eliminating the need for extra conditions at the inference stage. We realize this idea by proposing a human-centric alignment loss to strengthen human-related information from the textual prompts within the cross-attention maps. To ensure semantic detail richness and human structural accuracy during fine-tuning, we introduce scale-aware and step-wise constraints within the diffusion process, according to an in-depth analysis of the cross-attention layer. Extensive experiments show that our method largely improves over state-of-the-art text-to-image models to synthesize high-quality human images based on user-written prompts. Project page: \url{https://hcplayercvpr2024.github.io}.
title	Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
topic	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2403.05239

Similar Items