:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gondal, Moazzam Umer, Qudous, Hamad Ul, Siddiqui, Daniya, Farhan, Asma Ahmad
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2511.19149
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond the Hype: Comparing Lightweight and Deep Learning Models for Air Quality Forecasting
by: Gondal, Moazzam Umer, et al.
Published: (2025)

Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models
by: Gondal, Moazzam Umer, et al.
Published: (2026)

Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
by: Li, Wenyan, et al.
Published: (2024)

Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
by: Fonseca, Rui, et al.
Published: (2025)

From Pixels to Prose: A Large Dataset of Dense Image Captions
by: Singla, Vasu, et al.
Published: (2024)

Towards Retrieval-Augmented Architectures for Image Captioning
by: Sarto, Sara, et al.
Published: (2024)

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)

Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning
by: Naz, Zubia, et al.
Published: (2025)

Temporal Image Caption Retrieval Competition -- Description and Results
by: Pokrywka, Jakub, et al.
Published: (2024)

From Image Captioning to Visual Storytelling
by: Passadakis, Admitos, et al.
Published: (2025)

Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
by: Anagnostopoulou, Aliki, et al.
Published: (2023)

MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation
by: Loo, Gowen, et al.
Published: (2025)

Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
by: Dhawan, Aashish, et al.
Published: (2026)

OmniCaptioner: One Captioner to Rule Them All
by: Lu, Yiting, et al.
Published: (2025)

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
by: Gao, Bingjie, et al.
Published: (2025)

AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
by: Chaturvedi, Saket S., et al.
Published: (2025)

LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
by: Song, Steven, et al.
Published: (2024)

Multi-LLM Collaborative Caption Generation in Scientific Documents
by: Kim, Jaeyoung, et al.
Published: (2025)

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation
by: Sanguigni, Fulvio, et al.
Published: (2025)

From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)

Retrieval-Augmented Egocentric Video Captioning
by: Xu, Jilan, et al.
Published: (2024)

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
by: Gwilliam, Matthew, et al.
Published: (2023)

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
by: Lu, Songshuo, et al.
Published: (2024)

AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
by: Gao, Sensen, et al.
Published: (2025)

Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)

From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models
by: Yang, Cheng, et al.
Published: (2026)

Open Vocabulary Panoptic Segmentation With Retrieval Augmentation
by: Sadeq, Nafis, et al.
Published: (2026)

Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning
by: Chaffin, Antoine, et al.
Published: (2024)

Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation
by: Zhao, Shu, et al.
Published: (2025)

Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
by: Wu, Yin, et al.
Published: (2025)

PixelWorld: How Far Are We from Perceiving Everything as Pixels?
by: Lyu, Zhiheng, et al.
Published: (2025)

From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
by: Shang, Yuying, et al.
Published: (2024)

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)

Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts
by: Xu, Run, et al.
Published: (2026)

CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation
by: Yayavaram, Arnav, et al.
Published: (2025)

VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph
by: Wang, Qiuchen, et al.
Published: (2026)

VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation
by: Sun, Yubo, et al.
Published: (2025)

Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
by: Shen, Wenxuan, et al.
Published: (2025)

Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
by: Dong, Kuicai, et al.
Published: (2025)