Saved in:
| Main Authors: | Wu, Si, Smith, David A. |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2306.03168 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
by: Kasaei, Seyed Amir, et al.
Published: (2025)
by: Kasaei, Seyed Amir, et al.
Published: (2025)
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models
by: Huang, Jia-Hong, et al.
Published: (2024)
by: Huang, Jia-Hong, et al.
Published: (2024)
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
by: Tang, Raphael, et al.
Published: (2024)
by: Tang, Raphael, et al.
Published: (2024)
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
by: Zhao, Haozhe, et al.
Published: (2024)
by: Zhao, Haozhe, et al.
Published: (2024)
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
by: Dong, Xiaoyi, et al.
Published: (2024)
by: Dong, Xiaoyi, et al.
Published: (2024)
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
by: Jiang, Dongzhi, et al.
Published: (2024)
by: Jiang, Dongzhi, et al.
Published: (2024)
Universal Prompt Optimizer for Safe Text-to-Image Generation
by: Wu, Zongyu, et al.
Published: (2024)
by: Wu, Zongyu, et al.
Published: (2024)
ComCLIP: Training-Free Compositional Image and Text Matching
by: Jiang, Kenan, et al.
Published: (2022)
by: Jiang, Kenan, et al.
Published: (2022)
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)
by: Cheng, Sheng, et al.
Published: (2024)
VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval
by: Wu, Di, et al.
Published: (2025)
by: Wu, Di, et al.
Published: (2025)
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation
by: Wang, Jiapeng, et al.
Published: (2024)
by: Wang, Jiapeng, et al.
Published: (2024)
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation
by: Ozaki, Shintaro, et al.
Published: (2025)
by: Ozaki, Shintaro, et al.
Published: (2025)
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
by: Miranda, Imanol, et al.
Published: (2024)
by: Miranda, Imanol, et al.
Published: (2024)
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
by: Feng, Weixi, et al.
Published: (2024)
by: Feng, Weixi, et al.
Published: (2024)
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
Optimizing Prompts for Text-to-Image Generation
by: Hao, Yaru, et al.
Published: (2022)
by: Hao, Yaru, et al.
Published: (2022)
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
by: Huang, Han, et al.
Published: (2024)
by: Huang, Han, et al.
Published: (2024)
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
by: Dinkar, Tanvi, et al.
Published: (2025)
by: Dinkar, Tanvi, et al.
Published: (2025)
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
by: Wang, Weizhi, et al.
Published: (2024)
by: Wang, Weizhi, et al.
Published: (2024)
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback
by: Wan, Yixin, et al.
Published: (2025)
by: Wan, Yixin, et al.
Published: (2025)
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
by: Wu, Yin, et al.
Published: (2025)
by: Wu, Yin, et al.
Published: (2025)
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)
by: Yamabe, Shojiro, et al.
Published: (2025)
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
by: Hu, Yushi, et al.
Published: (2025)
by: Hu, Yushi, et al.
Published: (2025)
CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation
by: Yang, Mingyue, et al.
Published: (2025)
by: Yang, Mingyue, et al.
Published: (2025)
Fast Prompt Alignment for Text-to-Image Generation
by: Mrini, Khalil, et al.
Published: (2024)
by: Mrini, Khalil, et al.
Published: (2024)
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)
by: Huang, Haoyang, et al.
Published: (2025)
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
by: Chin, Zhi-Yi, et al.
Published: (2023)
by: Chin, Zhi-Yi, et al.
Published: (2023)
Quality-Aware Image-Text Alignment for Opinion-Unaware Image Quality Assessment
by: Agnolucci, Lorenzo, et al.
Published: (2024)
by: Agnolucci, Lorenzo, et al.
Published: (2024)
Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models
by: Shin, Philip Wootaek, et al.
Published: (2024)
by: Shin, Philip Wootaek, et al.
Published: (2024)
Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation
by: Yang, Jianjiang, et al.
Published: (2025)
by: Yang, Jianjiang, et al.
Published: (2025)
ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
by: Ruan, Chenxi, et al.
Published: (2026)
by: Ruan, Chenxi, et al.
Published: (2026)
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
by: Tian, Changyao, et al.
Published: (2024)
by: Tian, Changyao, et al.
Published: (2024)
Teaching Text-to-Image Models to Communicate in Dialog
by: Sun, Xiaowen, et al.
Published: (2023)
by: Sun, Xiaowen, et al.
Published: (2023)
Evaluating Numerical Reasoning in Text-to-Image Models
by: Kajić, Ivana, et al.
Published: (2024)
by: Kajić, Ivana, et al.
Published: (2024)
Emergent Visual-Semantic Hierarchies in Image-Text Representations
by: Alper, Morris, et al.
Published: (2024)
by: Alper, Morris, et al.
Published: (2024)
Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models
by: Conwell, Colin, et al.
Published: (2024)
by: Conwell, Colin, et al.
Published: (2024)
Similar Items
-
Evaluating the Evaluators: Metrics for Compositional Text-to-Image Generation
by: Kasaei, Seyed Amir, et al.
Published: (2025) -
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models
by: Huang, Jia-Hong, et al.
Published: (2024) -
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
by: Tang, Raphael, et al.
Published: (2024) -
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
by: Zhao, Haozhe, et al.
Published: (2024) -
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
by: Dong, Xiaoyi, et al.
Published: (2024)