Saved in:
| Main Authors: | Liu, Delong, Li, Haiwen, Zhao, Zhicheng, Dong, Yuan |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.09059 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval
by: Liu, Delong, et al.
Published: (2023)
by: Liu, Delong, et al.
Published: (2023)
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
by: Jiang, Dongzhi, et al.
Published: (2024)
by: Jiang, Dongzhi, et al.
Published: (2024)
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval
by: Shen, Li-Cheng, et al.
Published: (2025)
by: Shen, Li-Cheng, et al.
Published: (2025)
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
by: Chen, Wenting, et al.
Published: (2023)
by: Chen, Wenting, et al.
Published: (2023)
Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting
by: Chen, Wenting, et al.
Published: (2024)
by: Chen, Wenting, et al.
Published: (2024)
Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval
by: Li, Haiwen, et al.
Published: (2025)
by: Li, Haiwen, et al.
Published: (2025)
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)
by: Yamabe, Shojiro, et al.
Published: (2025)
Text-only Synthesis for Image Captioning
by: Zhou, Qing, et al.
Published: (2024)
by: Zhou, Qing, et al.
Published: (2024)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion
by: Lv, Zheqi, et al.
Published: (2025)
by: Lv, Zheqi, et al.
Published: (2025)
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
by: Kim, Taewhan, et al.
Published: (2024)
by: Kim, Taewhan, et al.
Published: (2024)
Teaching Text-to-Image Models to Communicate in Dialog
by: Sun, Xiaowen, et al.
Published: (2023)
by: Sun, Xiaowen, et al.
Published: (2023)
MULTI: Multimodal Understanding Leaderboard with Text and Images
by: Zhu, Zichen, et al.
Published: (2024)
by: Zhu, Zichen, et al.
Published: (2024)
Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports
by: Guo, Guangyu, et al.
Published: (2024)
by: Guo, Guangyu, et al.
Published: (2024)
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)
by: Zhu, Mengdan, et al.
Published: (2025)
Holistic Evaluation for Interleaved Text-and-Image Generation
by: Liu, Minqian, et al.
Published: (2024)
by: Liu, Minqian, et al.
Published: (2024)
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
by: Vendrow, Edward, et al.
Published: (2024)
by: Vendrow, Edward, et al.
Published: (2024)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
by: Ventura, Mor, et al.
Published: (2025)
by: Ventura, Mor, et al.
Published: (2025)
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
by: Lyu, Yueming, et al.
Published: (2023)
by: Lyu, Yueming, et al.
Published: (2023)
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
by: Kim, Hyungjin, et al.
Published: (2025)
by: Kim, Hyungjin, et al.
Published: (2025)
Enhancing Steganographic Text Extraction: Evaluating the Impact of NLP Models on Accuracy and Semantic Coherence
by: Li, Mingyang, et al.
Published: (2024)
by: Li, Mingyang, et al.
Published: (2024)
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
by: He, Yutong, et al.
Published: (2024)
by: He, Yutong, et al.
Published: (2024)
Image-Text Relation Prediction for Multilingual Tweets
by: Rikters, Matīss, et al.
Published: (2025)
by: Rikters, Matīss, et al.
Published: (2025)
ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
by: Zhang, Leixin, et al.
Published: (2024)
by: Zhang, Leixin, et al.
Published: (2024)
GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval
by: Zou, Hao, et al.
Published: (2025)
by: Zou, Hao, et al.
Published: (2025)
Discriminative Probing and Tuning for Text-to-Image Generation
by: Qu, Leigang, et al.
Published: (2024)
by: Qu, Leigang, et al.
Published: (2024)
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
by: Kou, Siqi, et al.
Published: (2024)
by: Kou, Siqi, et al.
Published: (2024)
Progressive Image Restoration via Text-Conditioned Video Generation
by: Kang, Peng, et al.
Published: (2025)
by: Kang, Peng, et al.
Published: (2025)
Evaluating Text-to-Visual Generation with Image-to-Text Generation
by: Lin, Zhiqiu, et al.
Published: (2024)
by: Lin, Zhiqiu, et al.
Published: (2024)
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
by: Li, Zhang, et al.
Published: (2023)
by: Li, Zhang, et al.
Published: (2023)
MATE: Meet At The Embedding -- Connecting Images with Long Texts
by: Jang, Young Kyun, et al.
Published: (2024)
by: Jang, Young Kyun, et al.
Published: (2024)
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
by: Qu, Leigang, et al.
Published: (2024)
by: Qu, Leigang, et al.
Published: (2024)
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
by: Zhu, Xiangru, et al.
Published: (2024)
by: Zhu, Xiangru, et al.
Published: (2024)
MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
by: Cheng, Liwei, et al.
Published: (2026)
by: Cheng, Liwei, et al.
Published: (2026)
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models
by: Holtermann, Carolin, et al.
Published: (2026)
by: Holtermann, Carolin, et al.
Published: (2026)
ComCLIP: Training-Free Compositional Image and Text Matching
by: Jiang, Kenan, et al.
Published: (2022)
by: Jiang, Kenan, et al.
Published: (2022)
Multi-path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
by: Kang, Bin, et al.
Published: (2024)
by: Kang, Bin, et al.
Published: (2024)
MINOS: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
by: Zhang, Junzhe, et al.
Published: (2025)
by: Zhang, Junzhe, et al.
Published: (2025)
Similar Items
-
Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval
by: Liu, Delong, et al.
Published: (2023) -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
by: Jiang, Dongzhi, et al.
Published: (2024) -
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval
by: Shen, Li-Cheng, et al.
Published: (2025) -
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
by: Chen, Wenting, et al.
Published: (2023) -
Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)