:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Delong, Li, Haiwen, Zhao, Zhicheng, Dong, Yuan
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2307.09059
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval
by: Liu, Delong, et al.
Published: (2023)

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
by: Jiang, Dongzhi, et al.
Published: (2024)

Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval
by: Shen, Li-Cheng, et al.
Published: (2025)

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
by: Chen, Wenting, et al.
Published: (2023)

Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)

Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting
by: Chen, Wenting, et al.
Published: (2024)

Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval
by: Li, Haiwen, et al.
Published: (2025)

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)

Text-only Synthesis for Image Captioning
by: Zhou, Qing, et al.
Published: (2024)

Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion
by: Lv, Zheqi, et al.
Published: (2025)

ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
by: Kim, Taewhan, et al.
Published: (2024)

Teaching Text-to-Image Models to Communicate in Dialog
by: Sun, Xiaowen, et al.
Published: (2023)

MULTI: Multimodal Understanding Leaderboard with Text and Images
by: Zhu, Zichen, et al.
Published: (2024)

Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports
by: Guo, Guangyu, et al.
Published: (2024)

Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)

Holistic Evaluation for Interleaved Text-and-Image Generation
by: Liu, Minqian, et al.
Published: (2024)

INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
by: Vendrow, Edward, et al.
Published: (2024)

DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
by: Lyu, Yueming, et al.
Published: (2023)

Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
by: Kim, Hyungjin, et al.
Published: (2025)

Enhancing Steganographic Text Extraction: Evaluating the Impact of NLP Models on Accuracy and Semantic Coherence
by: Li, Mingyang, et al.
Published: (2024)

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
by: He, Yutong, et al.
Published: (2024)

Image-Text Relation Prediction for Multilingual Tweets
by: Rikters, Matīss, et al.
Published: (2025)

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
by: Zhang, Leixin, et al.
Published: (2024)

GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval
by: Zou, Hao, et al.
Published: (2025)

Discriminative Probing and Tuning for Text-to-Image Generation
by: Qu, Leigang, et al.
Published: (2024)

Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
by: Kou, Siqi, et al.
Published: (2024)

Progressive Image Restoration via Text-Conditioned Video Generation
by: Kang, Peng, et al.
Published: (2025)

Evaluating Text-to-Visual Generation with Image-to-Text Generation
by: Lin, Zhiqiu, et al.
Published: (2024)

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
by: Li, Zhang, et al.
Published: (2023)

MATE: Meet At The Embedding -- Connecting Images with Long Texts
by: Jang, Young Kyun, et al.
Published: (2024)

TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
by: Qu, Leigang, et al.
Published: (2024)

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
by: Zhu, Xiangru, et al.
Published: (2024)

MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
by: Cheng, Liwei, et al.
Published: (2026)

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)

Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)

TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models
by: Holtermann, Carolin, et al.
Published: (2026)

ComCLIP: Training-Free Compositional Image and Text Matching
by: Jiang, Kenan, et al.
Published: (2022)

Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
by: Kang, Bin, et al.
Published: (2024)

MINOS: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
by: Zhang, Junzhe, et al.
Published: (2025)