:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Taetz, Bertram, Bordelius, Gal
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.06009
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
by: Li, Yuheng, et al.
Published: (2024)

Improving Text Generation on Images with Synthetic Captions
by: Koh, Jun Young, et al.
Published: (2024)

Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions
by: Gutflaish, Eyal, et al.
Published: (2025)

SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
by: Zhang, Lin, et al.
Published: (2025)

Benchmarking and Improving Detail Image Caption
by: Dong, Hongyuan, et al.
Published: (2024)

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
by: Ye, Qinghao, et al.
Published: (2025)

Panoptic Captioning: An Equivalence Bridge for Image and Text
by: Lin, Kun-Yu, et al.
Published: (2025)

ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
by: Xu, Zitong, et al.
Published: (2026)

Amortized Inverse Kinematics via Graph Attention for Real-Time Human Avatar Animation
by: Khan, Muhammad Saif Ullah, et al.
Published: (2026)

Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
by: Cui, Tianyu, et al.
Published: (2025)

Text Data-Centric Image Captioning with Interactive Prompts
by: Wang, Yiyu, et al.
Published: (2024)

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)

Text-only Synthesis for Image Captioning
by: Zhou, Qing, et al.
Published: (2024)

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)

Learning to Rank Caption Chains for Video-Text Alignment
by: Blume, Ansel, et al.
Published: (2026)

Policy Optimized Text-to-Image Pipeline Design
by: Gadot, Uri, et al.
Published: (2025)

Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
by: Wang, Xinran, et al.
Published: (2025)

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)

LCM-Lookahead for Encoder-based Text-to-Image Personalization
by: Gal, Rinon, et al.
Published: (2024)

Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning
by: Suresh, Yogesh Thakku, et al.
Published: (2025)

Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)

Text-to-Image Alignment in Denoising-Based Models through Step Selection
by: Grimal, Paul, et al.
Published: (2025)

Improving Long-Text Alignment for Text-to-Image Diffusion Models
by: Liu, Luping, et al.
Published: (2024)

Image Captions are Natural Prompts for Text-to-Image Models
by: Lei, Shiye, et al.
Published: (2023)

VIXEN: Visual Text Comparison Network for Image Difference Captioning
by: Black, Alexander, et al.
Published: (2024)

Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M)
by: Merchant, Nicholas, et al.
Published: (2025)

Key-Locked Rank One Editing for Text-to-Image Personalization
by: Tewel, Yoad, et al.
Published: (2023)

Pretrained Image-Text Models are Secretly Video Captioners
by: Zhang, Chunhui, et al.
Published: (2025)

Is Your Text-to-Image Model Robust to Caption Noise?
by: Yu, Weichen, et al.
Published: (2024)

Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image
by: Yiflach, Sapir Esther, et al.
Published: (2025)

CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
by: Gal, Rinon, et al.
Published: (2024)

Extending CLIP's Image-Text Alignment to Referring Image Segmentation
by: Kim, Seoyeon, et al.
Published: (2023)

EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
by: Zhang, Junzhe, et al.
Published: (2024)

Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision
by: Zanca, Dario, et al.
Published: (2024)

CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
by: Luo, Jianjie, et al.
Published: (2024)

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
by: Samuel, Dvir, et al.
Published: (2023)

Image Generation from Image Captioning -- Invertible Approach
by: Menon, Nandakishore S, et al.
Published: (2024)

Language-Image Alignment with Fixed Text Encoders
by: Yang, Jingfeng, et al.
Published: (2025)