:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cao, Stanley, Young, Sonny
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.18949
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Figuring out Figures: Using Textual References to Caption Scientific Figures
by: Cao, Stanley, et al.
Published: (2024)

Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models
by: Chen, Yule, et al.
Published: (2025)

Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026)

CaptionFool: Universal Image Captioning Model Attacks
by: Parekh, Swapnil
Published: (2026)

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)

Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
by: Ryan, Yuriel, et al.
Published: (2025)

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)

Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
by: Song, Yuchen, et al.
Published: (2025)

Imagine How To Change: Explicit Procedure Modeling for Change Captioning
by: Sun, Jiayang, et al.
Published: (2026)

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
by: Yang, Chenglin, et al.
Published: (2023)

Image Captioning in news report scenario
by: Liu, Tianrui, et al.
Published: (2024)

Automated Image Captioning with CNNs and Transformers
by: Cahyono, Joshua Adrian, et al.
Published: (2024)

Caption This, Reason That: VLMs Caught in the Middle
by: Weng, Zihan, et al.
Published: (2025)

URECA: Unique Region Caption Anything
by: Lim, Sangbeom, et al.
Published: (2025)

Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)

Mitigating Open-Vocabulary Caption Hallucinations
by: Ben-Kish, Assaf, et al.
Published: (2023)

SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning
by: Kim, Ye-Chan, et al.
Published: (2026)

Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning
by: You, Xiaoxing, et al.
Published: (2025)

Self-Explainable Affordance Learning with Embodied Caption
by: Zhang, Zhipeng, et al.
Published: (2024)

Image Embedding Sampling Method for Diverse Captioning
by: Waheed, Sania, et al.
Published: (2025)

Top-Down Semantic Refinement for Image Captioning
by: Zhang, Jusheng, et al.
Published: (2025)

Parrot Captions Teach CLIP to Spot Text
by: Lin, Yiqi, et al.
Published: (2023)

Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
by: Li, Zizun, et al.
Published: (2025)

ALOHa: A New Measure for Hallucination in Captioning Models
by: Petryk, Suzanne, et al.
Published: (2024)

Is Your Text-to-Image Model Robust to Caption Noise?
by: Yu, Weichen, et al.
Published: (2024)

ReflectCAP: Detailed Image Captioning with Reflective Memory
by: Min, Kyungmin, et al.
Published: (2026)

An Ensemble Model with Attention Based Mechanism for Image Captioning
by: Badarneh, Israa Al, et al.
Published: (2025)

Describe Anything: Detailed Localized Image and Video Captioning
by: Lian, Long, et al.
Published: (2025)

Towards Fine-Grained Human Motion Video Captioning
by: Song, Guorui, et al.
Published: (2025)

Generating Accurate and Detailed Captions for High-Resolution Images
by: Lee, Hankyeol, et al.
Published: (2025)

Leveraging Textual Compositional Reasoning for Robust Change Captioning
by: Park, Kyu Ri, et al.
Published: (2025)

On Explaining Visual Captioning with Hybrid Markov Logic Networks
by: Shah, Monika, et al.
Published: (2025)

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
by: Xing, Long, et al.
Published: (2025)

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by: Hsieh, Yu-Guan, et al.
Published: (2024)

CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
by: Fiastre, Gabriel, et al.
Published: (2025)

XMeCap: Meme Caption Generation with Sub-Image Adaptability
by: Chen, Yuyan, et al.
Published: (2024)

Uterine Ultrasound Image Captioning Using Deep Learning Techniques
by: Boulesnane, Abdennour, et al.
Published: (2024)

KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
by: Jiang, Yanbei, et al.
Published: (2024)

KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
by: Pham, Anh-Cuong, et al.
Published: (2024)