Saved in:
Bibliographic Details
Main Authors: Jamil, Sofia, Reddy, Bollampalli Areen, Kumar, Raghvendra, Saha, Sriparna, Goswami, Koustava, Joseph, K. J.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.13708
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913954946613248
author Jamil, Sofia
Reddy, Bollampalli Areen
Kumar, Raghvendra
Saha, Sriparna
Goswami, Koustava
Joseph, K. J.
author_facet Jamil, Sofia
Reddy, Bollampalli Areen
Kumar, Raghvendra
Saha, Sriparna
Goswami, Koustava
Joseph, K. J.
contents Recent advancements in text-to-image diffusion models have achieved remarkable success in generating realistic and diverse visual content. A critical factor in this process is the model's ability to accurately interpret textual prompts. However, these models often struggle with creative expressions, particularly those involving complex, abstract, or highly descriptive language. In this work, we introduce a novel training-free approach tailored to improve image generation for a unique form of creative language: poetic verse, which frequently features layered, abstract, and dual meanings. Our proposed PoemTale Diffusion approach aims to minimise the information that is lost during poetic text-to-image conversion by integrating a multi stage prompt refinement loop into Language Models to enhance the interpretability of poetic texts. To support this, we adapt existing state-of-the-art diffusion models by modifying their self-attention mechanisms with a consistent self-attention technique to generate multiple consistent images, which are then collectively used to convey the poem's meaning. Moreover, to encourage research in the field of poetry, we introduce the P4I (PoemForImage) dataset, consisting of 1111 poems sourced from multiple online and offline resources. We engaged a panel of poetry experts for qualitative assessments. The results from both human and quantitative evaluations validate the efficacy of our method and contribute a novel perspective to poem-to-image generation with enhanced information capture in the generated images.
format Preprint
id arxiv_https___arxiv_org_abs_2507_13708
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement
Jamil, Sofia
Reddy, Bollampalli Areen
Kumar, Raghvendra
Saha, Sriparna
Goswami, Koustava
Joseph, K. J.
Computer Vision and Pattern Recognition
Recent advancements in text-to-image diffusion models have achieved remarkable success in generating realistic and diverse visual content. A critical factor in this process is the model's ability to accurately interpret textual prompts. However, these models often struggle with creative expressions, particularly those involving complex, abstract, or highly descriptive language. In this work, we introduce a novel training-free approach tailored to improve image generation for a unique form of creative language: poetic verse, which frequently features layered, abstract, and dual meanings. Our proposed PoemTale Diffusion approach aims to minimise the information that is lost during poetic text-to-image conversion by integrating a multi stage prompt refinement loop into Language Models to enhance the interpretability of poetic texts. To support this, we adapt existing state-of-the-art diffusion models by modifying their self-attention mechanisms with a consistent self-attention technique to generate multiple consistent images, which are then collectively used to convey the poem's meaning. Moreover, to encourage research in the field of poetry, we introduce the P4I (PoemForImage) dataset, consisting of 1111 poems sourced from multiple online and offline resources. We engaged a panel of poetry experts for qualitative assessments. The results from both human and quantitative evaluations validate the efficacy of our method and contribute a novel perspective to poem-to-image generation with enhanced information capture in the generated images.
title PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2507.13708