:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Binbin, Yang, Guimiao, Qi, Zisen, Wang, Haiping, Ding, Yu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.24813
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
by: Long, Xiaosheng, et al.
Published: (2025)

ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
by: Kim, Taewhan, et al.
Published: (2024)

DualPrompt-MedCap: A Dual-Prompt Enhanced Approach for Medical Image Captioning
by: Zhao, Yining, et al.
Published: (2025)

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025)

Hierarchical Dual-Change Collaborative Learning for UAV Scene Change Captioning
by: Chen, Fuhai, et al.
Published: (2026)

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
by: Tang, Zhijiang, et al.
Published: (2026)

Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)

Dual Prompting Image Restoration with Diffusion Transformers
by: Kong, Dehong, et al.
Published: (2025)

XMeCap: Meme Caption Generation with Sub-Image Adaptability
by: Chen, Yuyan, et al.
Published: (2024)

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
by: You, Zuyao, et al.
Published: (2025)

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
by: Xing, Long, et al.
Published: (2025)

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
by: Fan, Tiehan, et al.
Published: (2024)

DualTSR: Unified Dual-Diffusion Transformer for Scene Text Image Super-Resolution
by: Niu, Axi, et al.
Published: (2026)

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
by: Ye, Shaokai, et al.
Published: (2026)

DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning
by: Wei, Yuancheng, et al.
Published: (2026)

Dual Relation Alignment for Composed Image Retrieval
by: Jiang, Xintong, et al.
Published: (2023)

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
by: Huang, Tzu-Heng, et al.
Published: (2026)

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by: Hsieh, Yu-Guan, et al.
Published: (2024)

Beyond Prompt Degradation: Prototype-guided Dual-pool Prompting for Incremental Object Detection
by: Zhang, Yaoteng, et al.
Published: (2026)

Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
by: Zeng, Gangyan, et al.
Published: (2024)

CompCap: Improving Multimodal Large Language Models with Composite Captions
by: Chen, Xiaohui, et al.
Published: (2024)

Image Captioning in news report scenario
by: Liu, Tianrui, et al.
Published: (2024)

Is Your Text-to-Image Model Robust to Caption Noise?
by: Yu, Weichen, et al.
Published: (2024)

Image Captions are Natural Prompts for Text-to-Image Models
by: Lei, Shiye, et al.
Published: (2023)

Dual-Modal Prompting for Sketch-Based Image Retrieval
by: Gao, Liying, et al.
Published: (2024)

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors
by: Cheng, Wei-Yuan, et al.
Published: (2026)

ChartCap: Mitigating Hallucination of Dense Chart Captioning
by: Lim, Junyoung, et al.
Published: (2025)

LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
by: Zhang, Haomin, et al.
Published: (2025)

Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
by: Lee, Ji Soo, et al.
Published: (2025)

ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization
by: Nguyen, Thinh-Phuc, et al.
Published: (2025)

Dual-Stream Collaborative Transformer for Image Captioning
by: Wan, Jun, et al.
Published: (2026)

CapGeo: A Caption-Assisted Approach to Geometric Reasoning
by: Li, Yuying, et al.
Published: (2025)

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
by: Jiang, Chen, et al.
Published: (2023)

Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
by: Wang, Jiaxiang, et al.
Published: (2025)

VoCap: Video Object Captioning and Segmentation from Any Prompt
by: Uijlings, Jasper, et al.
Published: (2025)

Dual Latent Memory for Visual Multi-agent System
by: Yu, Xinlei, et al.
Published: (2026)

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
by: Jin, Bu, et al.
Published: (2024)

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
by: Ng, Ho Yin 'Sam', et al.
Published: (2025)

RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
by: Gu, Jinjing, et al.
Published: (2025)

Retrieval-Guided Generation for Safer Histopathology Image Captioning
by: Hoq, Md. Enamul, et al.
Published: (2026)