Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yilun, He, Minggui, Yao, Feiyu, Ji, Yuhe, Tao, Shimin, Du, Jingzhou, Li, Duan, Gao, Jian, Zhang, Li, Yang, Hao, Chen, Boxing, Yoshie, Osamu
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.12910
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909842605604864
author	Liu, Yilun He, Minggui Yao, Feiyu Ji, Yuhe Tao, Shimin Du, Jingzhou Li, Duan Gao, Jian Zhang, Li Yang, Hao Chen, Boxing Yoshie, Osamu
author_facet	Liu, Yilun He, Minggui Yao, Feiyu Ji, Yuhe Tao, Shimin Du, Jingzhou Li, Duan Gao, Jian Zhang, Li Yang, Hao Chen, Boxing Yoshie, Osamu
contents	The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models are sensitive on textual prompts, posing a challenge for novice users who may not be familiar with TIS prompt writing. Existing solutions relieve this via automatic prompt expansion or generation from a user query. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. Thus, we propose DialPrompt, a dialogue-based TIS prompt generation model that emphasizes user experience for novice users. DialPrompt is designed to follow a multi-turn workflow, where in each round of dialogue the model guides user to express their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt improves user-centricity by allowing users to perceive and control the creation process of TIS prompts. Experiments indicate that DialPrompt improves significantly in user-centricity score compared with existing approaches while maintaining a competitive quality of synthesized images. In our user evaluation, DialPrompt is highly rated by 19 human reviewers (especially novices).
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_12910
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance Liu, Yilun He, Minggui Yao, Feiyu Ji, Yuhe Tao, Shimin Du, Jingzhou Li, Duan Gao, Jian Zhang, Li Yang, Hao Chen, Boxing Yoshie, Osamu Artificial Intelligence The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models are sensitive on textual prompts, posing a challenge for novice users who may not be familiar with TIS prompt writing. Existing solutions relieve this via automatic prompt expansion or generation from a user query. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. Thus, we propose DialPrompt, a dialogue-based TIS prompt generation model that emphasizes user experience for novice users. DialPrompt is designed to follow a multi-turn workflow, where in each round of dialogue the model guides user to express their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt improves user-centricity by allowing users to perceive and control the creation process of TIS prompts. Experiments indicate that DialPrompt improves significantly in user-centricity score compared with existing approaches while maintaining a competitive quality of synthesized images. In our user evaluation, DialPrompt is highly rated by 19 human reviewers (especially novices).
title	Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance
topic	Artificial Intelligence
url	https://arxiv.org/abs/2408.12910

Similar Items