:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Si, Smith, David A.
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2306.03168
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
by: Kasaei, Seyed Amir, et al.
Published: (2025)

Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models
by: Huang, Jia-Hong, et al.
Published: (2024)

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
by: Tang, Raphael, et al.
Published: (2024)

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
by: Zhao, Haozhe, et al.
Published: (2024)

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
by: Dong, Xiaoyi, et al.
Published: (2024)

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
by: Jiang, Dongzhi, et al.
Published: (2024)

Universal Prompt Optimizer for Safe Text-to-Image Generation
by: Wu, Zongyu, et al.
Published: (2024)

ComCLIP: Training-Free Compositional Image and Text Matching
by: Jiang, Kenan, et al.
Published: (2022)

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)

VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval
by: Wu, Di, et al.
Published: (2025)

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation
by: Wang, Jiapeng, et al.
Published: (2024)

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)

Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)

TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation
by: Ozaki, Shintaro, et al.
Published: (2025)

BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
by: Miranda, Imanol, et al.
Published: (2024)

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
by: Feng, Weixi, et al.
Published: (2024)

Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
by: Chen, Dongping, et al.
Published: (2024)

Optimizing Prompts for Text-to-Image Generation
by: Hao, Yaru, et al.
Published: (2022)

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
by: Huang, Han, et al.
Published: (2024)

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
by: Jia, Mengzhao, et al.
Published: (2024)

Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
by: Dinkar, Tanvi, et al.
Published: (2025)

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
by: Wang, Weizhi, et al.
Published: (2024)

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
by: Wang, Bin, et al.
Published: (2024)

CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback
by: Wan, Yixin, et al.
Published: (2025)

Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
by: Wu, Yin, et al.
Published: (2025)

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
by: Hu, Yushi, et al.
Published: (2025)

CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation
by: Yang, Mingyue, et al.
Published: (2025)

Fast Prompt Alignment for Text-to-Image Generation
by: Mrini, Khalil, et al.
Published: (2024)

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
by: Chin, Zhi-Yi, et al.
Published: (2023)

Quality-Aware Image-Text Alignment for Opinion-Unaware Image Quality Assessment
by: Agnolucci, Lorenzo, et al.
Published: (2024)

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models
by: Shin, Philip Wootaek, et al.
Published: (2024)

Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation
by: Yang, Jianjiang, et al.
Published: (2025)

ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
by: Ruan, Chenxi, et al.
Published: (2026)

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
by: Tian, Changyao, et al.
Published: (2024)

Teaching Text-to-Image Models to Communicate in Dialog
by: Sun, Xiaowen, et al.
Published: (2023)

Evaluating Numerical Reasoning in Text-to-Image Models
by: Kajić, Ivana, et al.
Published: (2024)

Emergent Visual-Semantic Hierarchies in Image-Text Representations
by: Alper, Morris, et al.
Published: (2024)

Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models
by: Conwell, Colin, et al.
Published: (2024)