Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Li, Daiqing, Kamko, Aleks, Akhgari, Ehsan, Sabet, Ali, Xu, Linmiao, Doshi, Suhail
Format:	Preprint
Publié:	2024
Sujets:	Computer Vision and Pattern Recognition Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2402.17245
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866929257866854400
author	Li, Daiqing Kamko, Aleks Akhgari, Ehsan Sabet, Ali Xu, Linmiao Doshi, Suhail
author_facet	Li, Daiqing Kamko, Aleks Akhgari, Ehsan Sabet, Ali Xu, Linmiao Doshi, Suhail
contents	In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-image generative models. We focus on three critical aspects for model improvement: enhancing color and contrast, improving generation across multiple aspect ratios, and improving human-centric fine details. First, we delve into the significance of the noise schedule in training a diffusion model, demonstrating its profound impact on realism and visual fidelity. Second, we address the challenge of accommodating various aspect ratios in image generation, emphasizing the importance of preparing a balanced bucketed dataset. Lastly, we investigate the crucial role of aligning model outputs with human preferences, ensuring that generated images resonate with human perceptual expectations. Through extensive analysis and experiments, Playground v2.5 demonstrates state-of-the-art performance in terms of aesthetic quality under various conditions and aspect ratios, outperforming both widely-used open-source models like SDXL and Playground v2, and closed-source commercial systems such as DALLE 3 and Midjourney v5.2. Our model is open-source, and we hope the development of Playground v2.5 provides valuable guidelines for researchers aiming to elevate the aesthetic quality of diffusion-based image generation models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_17245
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation Li, Daiqing Kamko, Aleks Akhgari, Ehsan Sabet, Ali Xu, Linmiao Doshi, Suhail Computer Vision and Pattern Recognition Artificial Intelligence In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-image generative models. We focus on three critical aspects for model improvement: enhancing color and contrast, improving generation across multiple aspect ratios, and improving human-centric fine details. First, we delve into the significance of the noise schedule in training a diffusion model, demonstrating its profound impact on realism and visual fidelity. Second, we address the challenge of accommodating various aspect ratios in image generation, emphasizing the importance of preparing a balanced bucketed dataset. Lastly, we investigate the crucial role of aligning model outputs with human preferences, ensuring that generated images resonate with human perceptual expectations. Through extensive analysis and experiments, Playground v2.5 demonstrates state-of-the-art performance in terms of aesthetic quality under various conditions and aspect ratios, outperforming both widely-used open-source models like SDXL and Playground v2, and closed-source commercial systems such as DALLE 3 and Midjourney v5.2. Our model is open-source, and we hope the development of Playground v2.5 provides valuable guidelines for researchers aiming to elevate the aesthetic quality of diffusion-based image generation models.
title	Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2402.17245

Documents similaires