Saved in:
| Main Authors: | Taetz, Bertram, Bordelius, Gal |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.06009 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
by: Li, Yuheng, et al.
Published: (2024)
by: Li, Yuheng, et al.
Published: (2024)
Improving Text Generation on Images with Synthetic Captions
by: Koh, Jun Young, et al.
Published: (2024)
by: Koh, Jun Young, et al.
Published: (2024)
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions
by: Gutflaish, Eyal, et al.
Published: (2025)
by: Gutflaish, Eyal, et al.
Published: (2025)
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)
by: Dong, Hongyuan, et al.
Published: (2024)
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
by: Ye, Qinghao, et al.
Published: (2025)
by: Ye, Qinghao, et al.
Published: (2025)
Panoptic Captioning: An Equivalence Bridge for Image and Text
by: Lin, Kun-Yu, et al.
Published: (2025)
by: Lin, Kun-Yu, et al.
Published: (2025)
ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
by: Xu, Zitong, et al.
Published: (2026)
by: Xu, Zitong, et al.
Published: (2026)
Amortized Inverse Kinematics via Graph Attention for Real-Time Human Avatar Animation
by: Khan, Muhammad Saif Ullah, et al.
Published: (2026)
by: Khan, Muhammad Saif Ullah, et al.
Published: (2026)
Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
by: Cui, Tianyu, et al.
Published: (2025)
by: Cui, Tianyu, et al.
Published: (2025)
Text Data-Centric Image Captioning with Interactive Prompts
by: Wang, Yiyu, et al.
Published: (2024)
by: Wang, Yiyu, et al.
Published: (2024)
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)
by: Qiu, Longtian, et al.
Published: (2024)
Text-only Synthesis for Image Captioning
by: Zhou, Qing, et al.
Published: (2024)
by: Zhou, Qing, et al.
Published: (2024)
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)
Learning to Rank Caption Chains for Video-Text Alignment
by: Blume, Ansel, et al.
Published: (2026)
by: Blume, Ansel, et al.
Published: (2026)
Policy Optimized Text-to-Image Pipeline Design
by: Gadot, Uri, et al.
Published: (2025)
by: Gadot, Uri, et al.
Published: (2025)
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
by: Wang, Xinran, et al.
Published: (2025)
by: Wang, Xinran, et al.
Published: (2025)
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)
by: Cheng, Sheng, et al.
Published: (2024)
LCM-Lookahead for Encoder-based Text-to-Image Personalization
by: Gal, Rinon, et al.
Published: (2024)
by: Gal, Rinon, et al.
Published: (2024)
Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning
by: Suresh, Yogesh Thakku, et al.
Published: (2025)
by: Suresh, Yogesh Thakku, et al.
Published: (2025)
Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)
by: Paischer, Fabian, et al.
Published: (2023)
Text-to-Image Alignment in Denoising-Based Models through Step Selection
by: Grimal, Paul, et al.
Published: (2025)
by: Grimal, Paul, et al.
Published: (2025)
Improving Long-Text Alignment for Text-to-Image Diffusion Models
by: Liu, Luping, et al.
Published: (2024)
by: Liu, Luping, et al.
Published: (2024)
Image Captions are Natural Prompts for Text-to-Image Models
by: Lei, Shiye, et al.
Published: (2023)
by: Lei, Shiye, et al.
Published: (2023)
VIXEN: Visual Text Comparison Network for Image Difference Captioning
by: Black, Alexander, et al.
Published: (2024)
by: Black, Alexander, et al.
Published: (2024)
Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M)
by: Merchant, Nicholas, et al.
Published: (2025)
by: Merchant, Nicholas, et al.
Published: (2025)
Key-Locked Rank One Editing for Text-to-Image Personalization
by: Tewel, Yoad, et al.
Published: (2023)
by: Tewel, Yoad, et al.
Published: (2023)
Pretrained Image-Text Models are Secretly Video Captioners
by: Zhang, Chunhui, et al.
Published: (2025)
by: Zhang, Chunhui, et al.
Published: (2025)
Is Your Text-to-Image Model Robust to Caption Noise?
by: Yu, Weichen, et al.
Published: (2024)
by: Yu, Weichen, et al.
Published: (2024)
Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image
by: Yiflach, Sapir Esther, et al.
Published: (2025)
by: Yiflach, Sapir Esther, et al.
Published: (2025)
CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)
by: Yang, Shijia, et al.
Published: (2025)
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
by: Gal, Rinon, et al.
Published: (2024)
by: Gal, Rinon, et al.
Published: (2024)
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
by: Kim, Seoyeon, et al.
Published: (2023)
by: Kim, Seoyeon, et al.
Published: (2023)
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
by: Zhang, Junzhe, et al.
Published: (2024)
by: Zhang, Junzhe, et al.
Published: (2024)
Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision
by: Zanca, Dario, et al.
Published: (2024)
by: Zanca, Dario, et al.
Published: (2024)
CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)
by: Saito, Kuniaki, et al.
Published: (2025)
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
by: Luo, Jianjie, et al.
Published: (2024)
by: Luo, Jianjie, et al.
Published: (2024)
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
by: Samuel, Dvir, et al.
Published: (2023)
by: Samuel, Dvir, et al.
Published: (2023)
Image Generation from Image Captioning -- Invertible Approach
by: Menon, Nandakishore S, et al.
Published: (2024)
by: Menon, Nandakishore S, et al.
Published: (2024)
Language-Image Alignment with Fixed Text Encoders
by: Yang, Jingfeng, et al.
Published: (2025)
by: Yang, Jingfeng, et al.
Published: (2025)
Similar Items
-
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
by: Li, Yuheng, et al.
Published: (2024) -
Improving Text Generation on Images with Synthetic Captions
by: Koh, Jun Young, et al.
Published: (2024) -
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions
by: Gutflaish, Eyal, et al.
Published: (2025) -
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
by: Zhang, Lin, et al.
Published: (2025) -
Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)