Saved in:
| Main Authors: | Li, Binbin, Yang, Guimiao, Qi, Zisen, Wang, Haiping, Ding, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.24813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
by: Long, Xiaosheng, et al.
Published: (2025)
by: Long, Xiaosheng, et al.
Published: (2025)
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
by: Kim, Taewhan, et al.
Published: (2024)
by: Kim, Taewhan, et al.
Published: (2024)
DualPrompt-MedCap: A Dual-Prompt Enhanced Approach for Medical Image Captioning
by: Zhao, Yining, et al.
Published: (2025)
by: Zhao, Yining, et al.
Published: (2025)
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025)
by: Xing, Long, et al.
Published: (2025)
Hierarchical Dual-Change Collaborative Learning for UAV Scene Change Captioning
by: Chen, Fuhai, et al.
Published: (2026)
by: Chen, Fuhai, et al.
Published: (2026)
CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
by: Tang, Zhijiang, et al.
Published: (2026)
by: Tang, Zhijiang, et al.
Published: (2026)
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
by: Özdemir, Övgü, et al.
Published: (2024)
by: Özdemir, Övgü, et al.
Published: (2024)
Dual Prompting Image Restoration with Diffusion Transformers
by: Kong, Dehong, et al.
Published: (2025)
by: Kong, Dehong, et al.
Published: (2025)
XMeCap: Meme Caption Generation with Sub-Image Adaptability
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
by: You, Zuyao, et al.
Published: (2025)
by: You, Zuyao, et al.
Published: (2025)
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
by: Xing, Long, et al.
Published: (2025)
by: Xing, Long, et al.
Published: (2025)
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
by: Fan, Tiehan, et al.
Published: (2024)
by: Fan, Tiehan, et al.
Published: (2024)
DualTSR: Unified Dual-Diffusion Transformer for Scene Text Image Super-Resolution
by: Niu, Axi, et al.
Published: (2026)
by: Niu, Axi, et al.
Published: (2026)
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
by: Ye, Shaokai, et al.
Published: (2026)
by: Ye, Shaokai, et al.
Published: (2026)
DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning
by: Wei, Yuancheng, et al.
Published: (2026)
by: Wei, Yuancheng, et al.
Published: (2026)
Dual Relation Alignment for Composed Image Retrieval
by: Jiang, Xintong, et al.
Published: (2023)
by: Jiang, Xintong, et al.
Published: (2023)
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
by: Huang, Tzu-Heng, et al.
Published: (2026)
by: Huang, Tzu-Heng, et al.
Published: (2026)
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by: Hsieh, Yu-Guan, et al.
Published: (2024)
by: Hsieh, Yu-Guan, et al.
Published: (2024)
Beyond Prompt Degradation: Prototype-guided Dual-pool Prompting for Incremental Object Detection
by: Zhang, Yaoteng, et al.
Published: (2026)
by: Zhang, Yaoteng, et al.
Published: (2026)
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
by: Zeng, Gangyan, et al.
Published: (2024)
by: Zeng, Gangyan, et al.
Published: (2024)
CompCap: Improving Multimodal Large Language Models with Composite Captions
by: Chen, Xiaohui, et al.
Published: (2024)
by: Chen, Xiaohui, et al.
Published: (2024)
Image Captioning in news report scenario
by: Liu, Tianrui, et al.
Published: (2024)
by: Liu, Tianrui, et al.
Published: (2024)
Is Your Text-to-Image Model Robust to Caption Noise?
by: Yu, Weichen, et al.
Published: (2024)
by: Yu, Weichen, et al.
Published: (2024)
Image Captions are Natural Prompts for Text-to-Image Models
by: Lei, Shiye, et al.
Published: (2023)
by: Lei, Shiye, et al.
Published: (2023)
Dual-Modal Prompting for Sketch-Based Image Retrieval
by: Gao, Liying, et al.
Published: (2024)
by: Gao, Liying, et al.
Published: (2024)
TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors
by: Cheng, Wei-Yuan, et al.
Published: (2026)
by: Cheng, Wei-Yuan, et al.
Published: (2026)
ChartCap: Mitigating Hallucination of Dense Chart Captioning
by: Lim, Junyoung, et al.
Published: (2025)
by: Lim, Junyoung, et al.
Published: (2025)
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
by: Lee, Ji Soo, et al.
Published: (2025)
by: Lee, Ji Soo, et al.
Published: (2025)
ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization
by: Nguyen, Thinh-Phuc, et al.
Published: (2025)
by: Nguyen, Thinh-Phuc, et al.
Published: (2025)
Dual-Stream Collaborative Transformer for Image Captioning
by: Wan, Jun, et al.
Published: (2026)
by: Wan, Jun, et al.
Published: (2026)
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
by: Li, Yuying, et al.
Published: (2025)
by: Li, Yuying, et al.
Published: (2025)
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
by: Jiang, Chen, et al.
Published: (2023)
by: Jiang, Chen, et al.
Published: (2023)
Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly Detection
by: Wang, Jiaxiang, et al.
Published: (2025)
by: Wang, Jiaxiang, et al.
Published: (2025)
VoCap: Video Object Captioning and Segmentation from Any Prompt
by: Uijlings, Jasper, et al.
Published: (2025)
by: Uijlings, Jasper, et al.
Published: (2025)
Dual Latent Memory for Visual Multi-agent System
by: Yu, Xinlei, et al.
Published: (2026)
by: Yu, Xinlei, et al.
Published: (2026)
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
by: Jin, Bu, et al.
Published: (2024)
by: Jin, Bu, et al.
Published: (2024)
LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
by: Ng, Ho Yin 'Sam', et al.
Published: (2025)
by: Ng, Ho Yin 'Sam', et al.
Published: (2025)
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
by: Gu, Jinjing, et al.
Published: (2025)
by: Gu, Jinjing, et al.
Published: (2025)
Retrieval-Guided Generation for Safer Histopathology Image Captioning
by: Hoq, Md. Enamul, et al.
Published: (2026)
by: Hoq, Md. Enamul, et al.
Published: (2026)
Similar Items
-
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
by: Long, Xiaosheng, et al.
Published: (2025) -
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
by: Kim, Taewhan, et al.
Published: (2024) -
DualPrompt-MedCap: A Dual-Prompt Enhanced Approach for Medical Image Captioning
by: Zhao, Yining, et al.
Published: (2025) -
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
by: Xing, Long, et al.
Published: (2025) -
Hierarchical Dual-Change Collaborative Learning for UAV Scene Change Captioning
by: Chen, Fuhai, et al.
Published: (2026)