:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Basioti, Kalliopi, Abdelsalam, Mohamed A., Fancellu, Federico, Pavlovic, Vladimir, Fazly, Afsaneh
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Online-Zugang:	https://arxiv.org/abs/2407.11393
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Box2Flow: Instance-based Action Flow Graphs from Videos
von: Li, Jiatong, et al.
Veröffentlicht: (2024)

CIC: A Framework for Culturally-Aware Image Captioning
von: Yun, Youngsik, et al.
Veröffentlicht: (2024)

GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
von: Basioti, Kalliopi, et al.
Veröffentlicht: (2025)

Towards Retrieval-Augmented Architectures for Image Captioning
von: Sarto, Sara, et al.
Veröffentlicht: (2024)

Augmenting Perceptual Super-Resolution via Image Quality Predictors
von: Zhang, Fengjia, et al.
Veröffentlicht: (2025)

EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations
von: Kim, Hyunjong, et al.
Veröffentlicht: (2025)

Text-only Synthesis for Image Captioning
von: Zhou, Qing, et al.
Veröffentlicht: (2024)

The Role of Data Curation in Image Captioning
von: Li, Wenyan, et al.
Veröffentlicht: (2023)

From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation
von: Gondal, Moazzam Umer, et al.
Veröffentlicht: (2025)

Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M)
von: Merchant, Nicholas, et al.
Veröffentlicht: (2025)

Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
von: Dhawan, Aashish, et al.
Veröffentlicht: (2026)

CAPEEN: Image Captioning with Early Exits and Knowledge Distillation
von: Bajpai, Divya Jyoti, et al.
Veröffentlicht: (2024)

Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
von: Li, Wenyan, et al.
Veröffentlicht: (2024)

The Devil is in the EOS: Sequence Training for Detailed Image Captioning
von: Mohamed, Abdelrahman, et al.
Veröffentlicht: (2025)

Explaining Caption-Image Interactions in CLIP Models with Second-Order Attributions
von: Möller, Lucas, et al.
Veröffentlicht: (2024)

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
von: Sarto, Sara, et al.
Veröffentlicht: (2025)

The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning
von: Bai, Longju, et al.
Veröffentlicht: (2024)

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
von: Matsuda, Kazuki, et al.
Veröffentlicht: (2024)

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
von: Wada, Yuiga, et al.
Veröffentlicht: (2024)

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
von: Kim, Si-Woo, et al.
Veröffentlicht: (2025)

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
von: Xing, Long, et al.
Veröffentlicht: (2025)

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
von: Matsuda, Kazuki, et al.
Veröffentlicht: (2025)

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
von: Anagnostopoulou, Aliki, et al.
Veröffentlicht: (2023)

Discovering Meaningful Units with Visually Grounded Semantics from Image Captions
von: Behjati, Melika, et al.
Veröffentlicht: (2025)

Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
von: Tevissen, Yannis, et al.
Veröffentlicht: (2024)

AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
von: Choudhury, Pankaj, et al.
Veröffentlicht: (2025)

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images
von: Chen, Pingyi, et al.
Veröffentlicht: (2023)

Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
von: Fonseca, Rui, et al.
Veröffentlicht: (2025)

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
von: Hashemi, Mohammad Abuzar, et al.
Veröffentlicht: (2021)

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis
von: Bucciarelli, Davide, et al.
Veröffentlicht: (2024)

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
von: Sarto, Sara, et al.
Veröffentlicht: (2024)

Unveiling the Invisible: Captioning Videos with Metaphors
von: Kalarani, Abisek Rajakumar, et al.
Veröffentlicht: (2024)

LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts
von: Jesani, Krunal, et al.
Veröffentlicht: (2025)

FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
von: Lee, Yebin, et al.
Veröffentlicht: (2024)

G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o
von: Tong, Tony Cheng, et al.
Veröffentlicht: (2024)

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
von: Moratelli, Nicholas, et al.
Veröffentlicht: (2024)

Updating CLIP to Prefer Descriptions Over Captions
von: Zur, Amir, et al.
Veröffentlicht: (2024)

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
von: Ahmadi, Saba, et al.
Veröffentlicht: (2023)

ChartCap: Mitigating Hallucination of Dense Chart Captioning
von: Lim, Junyoung, et al.
Veröffentlicht: (2025)

Imagine How To Change: Explicit Procedure Modeling for Change Captioning
von: Sun, Jiayang, et al.
Veröffentlicht: (2026)