Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jamil, Sofia, Reddy, Bollampalli Areen, Kumar, Raghvendra, Saha, Sriparna, Goswami, Koustava, Joseph, K. J.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.13708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913954946613248
author	Jamil, Sofia Reddy, Bollampalli Areen Kumar, Raghvendra Saha, Sriparna Goswami, Koustava Joseph, K. J.
author_facet	Jamil, Sofia Reddy, Bollampalli Areen Kumar, Raghvendra Saha, Sriparna Goswami, Koustava Joseph, K. J.
contents	Recent advancements in text-to-image diffusion models have achieved remarkable success in generating realistic and diverse visual content. A critical factor in this process is the model's ability to accurately interpret textual prompts. However, these models often struggle with creative expressions, particularly those involving complex, abstract, or highly descriptive language. In this work, we introduce a novel training-free approach tailored to improve image generation for a unique form of creative language: poetic verse, which frequently features layered, abstract, and dual meanings. Our proposed PoemTale Diffusion approach aims to minimise the information that is lost during poetic text-to-image conversion by integrating a multi stage prompt refinement loop into Language Models to enhance the interpretability of poetic texts. To support this, we adapt existing state-of-the-art diffusion models by modifying their self-attention mechanisms with a consistent self-attention technique to generate multiple consistent images, which are then collectively used to convey the poem's meaning. Moreover, to encourage research in the field of poetry, we introduce the P4I (PoemForImage) dataset, consisting of 1111 poems sourced from multiple online and offline resources. We engaged a panel of poetry experts for qualitative assessments. The results from both human and quantitative evaluations validate the efficacy of our method and contribute a novel perspective to poem-to-image generation with enhanced information capture in the generated images.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_13708
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement Jamil, Sofia Reddy, Bollampalli Areen Kumar, Raghvendra Saha, Sriparna Goswami, Koustava Joseph, K. J. Computer Vision and Pattern Recognition Recent advancements in text-to-image diffusion models have achieved remarkable success in generating realistic and diverse visual content. A critical factor in this process is the model's ability to accurately interpret textual prompts. However, these models often struggle with creative expressions, particularly those involving complex, abstract, or highly descriptive language. In this work, we introduce a novel training-free approach tailored to improve image generation for a unique form of creative language: poetic verse, which frequently features layered, abstract, and dual meanings. Our proposed PoemTale Diffusion approach aims to minimise the information that is lost during poetic text-to-image conversion by integrating a multi stage prompt refinement loop into Language Models to enhance the interpretability of poetic texts. To support this, we adapt existing state-of-the-art diffusion models by modifying their self-attention mechanisms with a consistent self-attention technique to generate multiple consistent images, which are then collectively used to convey the poem's meaning. Moreover, to encourage research in the field of poetry, we introduce the P4I (PoemForImage) dataset, consisting of 1111 poems sourced from multiple online and offline resources. We engaged a panel of poetry experts for qualitative assessments. The results from both human and quantitative evaluations validate the efficacy of our method and contribute a novel perspective to poem-to-image generation with enhanced information capture in the generated images.
title	PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2507.13708

Similar Items