Saved in:
Bibliographic Details
Main Authors: Ma, Shichao, Guo, Yunhe, Su, Jiahao, Huang, Qihe, Zhou, Zhengyang, Wang, Yang
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.06916
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Text-to-image generation tasks have driven remarkable advances in diverse media applications, yet most focus on single-turn scenarios and struggle with iterative, multi-turn creative tasks. Recent dialogue-based systems attempt to bridge this gap, but their single-agent, sequential paradigm often causes intention drift and incoherent edits. To address these limitations, we present Talk2Image, a novel multi-agent system for interactive image generation and editing in multi-turn dialogue scenarios. Our approach integrates three key components: intention parsing from dialogue history, task decomposition and collaborative execution across specialized agents, and feedback-driven refinement based on a multi-view evaluation mechanism. Talk2Image enables step-by-step alignment with user intention and consistent image editing. Experiments demonstrate that Talk2Image outperforms existing baselines in controllability, coherence, and user satisfaction across iterative image generation and editing tasks.