Saved in:
| Main Authors: | Cao, Stanley, Young, Sonny |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.18949 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Figuring out Figures: Using Textual References to Caption Scientific Figures
by: Cao, Stanley, et al.
Published: (2024)
by: Cao, Stanley, et al.
Published: (2024)
Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models
by: Chen, Yule, et al.
Published: (2025)
by: Chen, Yule, et al.
Published: (2025)
Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026)
by: Feng, Chengcheng, et al.
Published: (2026)
CaptionFool: Universal Image Captioning Model Attacks
by: Parekh, Swapnil
Published: (2026)
by: Parekh, Swapnil
Published: (2026)
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
by: Ryan, Yuriel, et al.
Published: (2025)
by: Ryan, Yuriel, et al.
Published: (2025)
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
by: Song, Yuchen, et al.
Published: (2025)
by: Song, Yuchen, et al.
Published: (2025)
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
by: Sun, Jiayang, et al.
Published: (2026)
by: Sun, Jiayang, et al.
Published: (2026)
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
by: Yang, Chenglin, et al.
Published: (2023)
by: Yang, Chenglin, et al.
Published: (2023)
Image Captioning in news report scenario
by: Liu, Tianrui, et al.
Published: (2024)
by: Liu, Tianrui, et al.
Published: (2024)
Automated Image Captioning with CNNs and Transformers
by: Cahyono, Joshua Adrian, et al.
Published: (2024)
by: Cahyono, Joshua Adrian, et al.
Published: (2024)
Caption This, Reason That: VLMs Caught in the Middle
by: Weng, Zihan, et al.
Published: (2025)
by: Weng, Zihan, et al.
Published: (2025)
URECA: Unique Region Caption Anything
by: Lim, Sangbeom, et al.
Published: (2025)
by: Lim, Sangbeom, et al.
Published: (2025)
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)
by: Shen, Yaojie, et al.
Published: (2023)
Mitigating Open-Vocabulary Caption Hallucinations
by: Ben-Kish, Assaf, et al.
Published: (2023)
by: Ben-Kish, Assaf, et al.
Published: (2023)
SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning
by: Kim, Ye-Chan, et al.
Published: (2026)
by: Kim, Ye-Chan, et al.
Published: (2026)
Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning
by: You, Xiaoxing, et al.
Published: (2025)
by: You, Xiaoxing, et al.
Published: (2025)
Self-Explainable Affordance Learning with Embodied Caption
by: Zhang, Zhipeng, et al.
Published: (2024)
by: Zhang, Zhipeng, et al.
Published: (2024)
Image Embedding Sampling Method for Diverse Captioning
by: Waheed, Sania, et al.
Published: (2025)
by: Waheed, Sania, et al.
Published: (2025)
Top-Down Semantic Refinement for Image Captioning
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Parrot Captions Teach CLIP to Spot Text
by: Lin, Yiqi, et al.
Published: (2023)
by: Lin, Yiqi, et al.
Published: (2023)
Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
by: Li, Zizun, et al.
Published: (2025)
by: Li, Zizun, et al.
Published: (2025)
ALOHa: A New Measure for Hallucination in Captioning Models
by: Petryk, Suzanne, et al.
Published: (2024)
by: Petryk, Suzanne, et al.
Published: (2024)
Is Your Text-to-Image Model Robust to Caption Noise?
by: Yu, Weichen, et al.
Published: (2024)
by: Yu, Weichen, et al.
Published: (2024)
ReflectCAP: Detailed Image Captioning with Reflective Memory
by: Min, Kyungmin, et al.
Published: (2026)
by: Min, Kyungmin, et al.
Published: (2026)
An Ensemble Model with Attention Based Mechanism for Image Captioning
by: Badarneh, Israa Al, et al.
Published: (2025)
by: Badarneh, Israa Al, et al.
Published: (2025)
Describe Anything: Detailed Localized Image and Video Captioning
by: Lian, Long, et al.
Published: (2025)
by: Lian, Long, et al.
Published: (2025)
Towards Fine-Grained Human Motion Video Captioning
by: Song, Guorui, et al.
Published: (2025)
by: Song, Guorui, et al.
Published: (2025)
Generating Accurate and Detailed Captions for High-Resolution Images
by: Lee, Hankyeol, et al.
Published: (2025)
by: Lee, Hankyeol, et al.
Published: (2025)
Leveraging Textual Compositional Reasoning for Robust Change Captioning
by: Park, Kyu Ri, et al.
Published: (2025)
by: Park, Kyu Ri, et al.
Published: (2025)
On Explaining Visual Captioning with Hybrid Markov Logic Networks
by: Shah, Monika, et al.
Published: (2025)
by: Shah, Monika, et al.
Published: (2025)
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
by: Xing, Long, et al.
Published: (2025)
by: Xing, Long, et al.
Published: (2025)
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by: Hsieh, Yu-Guan, et al.
Published: (2024)
by: Hsieh, Yu-Guan, et al.
Published: (2024)
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
by: Fiastre, Gabriel, et al.
Published: (2025)
by: Fiastre, Gabriel, et al.
Published: (2025)
XMeCap: Meme Caption Generation with Sub-Image Adaptability
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
Uterine Ultrasound Image Captioning Using Deep Learning Techniques
by: Boulesnane, Abdennour, et al.
Published: (2024)
by: Boulesnane, Abdennour, et al.
Published: (2024)
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
by: Jiang, Yanbei, et al.
Published: (2024)
by: Jiang, Yanbei, et al.
Published: (2024)
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
by: Pham, Anh-Cuong, et al.
Published: (2024)
by: Pham, Anh-Cuong, et al.
Published: (2024)
Similar Items
-
Figuring out Figures: Using Textual References to Caption Scientific Figures
by: Cao, Stanley, et al.
Published: (2024) -
Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models
by: Chen, Yule, et al.
Published: (2025) -
Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026) -
CaptionFool: Universal Image Captioning Model Attacks
by: Parekh, Swapnil
Published: (2026) -
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)