Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Zhichao, Gu, Tianjiao, Wang, Jianjie, Lin, Feiyu, Sheng, Xiangfei, Chen, Pengfei, Li, Leida
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.09271
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915664908779520
author	Yang, Zhichao Gu, Tianjiao Wang, Jianjie Lin, Feiyu Sheng, Xiangfei Chen, Pengfei Li, Leida
author_facet	Yang, Zhichao Gu, Tianjiao Wang, Jianjie Lin, Feiyu Sheng, Xiangfei Chen, Pengfei Li, Leida
contents	The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved based on these granular elements. Finally, the graph-structed annotations are converted into alignment scores and interpretations to facilitate the design of T2I evaluation models. Based on LongT2IBench, we further propose LongT2IExpert, a LongT2I evaluator that enables multi-modal large language models (MLLMs) to provide both quantitative scores and structured interpretations through an instruction-tuning process with Hierarchical Alignment Chain-of-Thought (CoT). Extensive experiments and comparisons demonstrate the superiority of the proposed LongT2IExpert in alignment evaluation and interpretation. Data and code have been released in https://welldky.github.io/LongT2IBench-Homepage/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_09271
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations Yang, Zhichao Gu, Tianjiao Wang, Jianjie Lin, Feiyu Sheng, Xiangfei Chen, Pengfei Li, Leida Computer Vision and Pattern Recognition The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved based on these granular elements. Finally, the graph-structed annotations are converted into alignment scores and interpretations to facilitate the design of T2I evaluation models. Based on LongT2IBench, we further propose LongT2IExpert, a LongT2I evaluator that enables multi-modal large language models (MLLMs) to provide both quantitative scores and structured interpretations through an instruction-tuning process with Hierarchical Alignment Chain-of-Thought (CoT). Extensive experiments and comparisons demonstrate the superiority of the proposed LongT2IExpert in alignment evaluation and interpretation. Data and code have been released in https://welldky.github.io/LongT2IBench-Homepage/.
title	LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.09271

Similar Items