Saved in:
| Main Authors: | Gondal, Moazzam Umer, Qudous, Hamad Ul, Siddiqui, Daniya, Farhan, Asma Ahmad |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.19149 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond the Hype: Comparing Lightweight and Deep Learning Models for Air Quality Forecasting
by: Gondal, Moazzam Umer, et al.
Published: (2025)
by: Gondal, Moazzam Umer, et al.
Published: (2025)
Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models
by: Gondal, Moazzam Umer, et al.
Published: (2026)
by: Gondal, Moazzam Umer, et al.
Published: (2026)
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
by: Li, Wenyan, et al.
Published: (2024)
by: Li, Wenyan, et al.
Published: (2024)
Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
by: Fonseca, Rui, et al.
Published: (2025)
by: Fonseca, Rui, et al.
Published: (2025)
From Pixels to Prose: A Large Dataset of Dense Image Captions
by: Singla, Vasu, et al.
Published: (2024)
by: Singla, Vasu, et al.
Published: (2024)
Towards Retrieval-Augmented Architectures for Image Captioning
by: Sarto, Sara, et al.
Published: (2024)
by: Sarto, Sara, et al.
Published: (2024)
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)
by: Liu, Peiyang, et al.
Published: (2026)
Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning
by: Naz, Zubia, et al.
Published: (2025)
by: Naz, Zubia, et al.
Published: (2025)
Temporal Image Caption Retrieval Competition -- Description and Results
by: Pokrywka, Jakub, et al.
Published: (2024)
by: Pokrywka, Jakub, et al.
Published: (2024)
From Image Captioning to Visual Storytelling
by: Passadakis, Admitos, et al.
Published: (2025)
by: Passadakis, Admitos, et al.
Published: (2025)
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
by: Anagnostopoulou, Aliki, et al.
Published: (2023)
by: Anagnostopoulou, Aliki, et al.
Published: (2023)
MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation
by: Loo, Gowen, et al.
Published: (2025)
by: Loo, Gowen, et al.
Published: (2025)
Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
by: Dhawan, Aashish, et al.
Published: (2026)
by: Dhawan, Aashish, et al.
Published: (2026)
OmniCaptioner: One Captioner to Rule Them All
by: Lu, Yiting, et al.
Published: (2025)
by: Lu, Yiting, et al.
Published: (2025)
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
by: Gao, Bingjie, et al.
Published: (2025)
by: Gao, Bingjie, et al.
Published: (2025)
AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
by: Chaturvedi, Saket S., et al.
Published: (2025)
by: Chaturvedi, Saket S., et al.
Published: (2025)
LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
by: Song, Steven, et al.
Published: (2024)
by: Song, Steven, et al.
Published: (2024)
Multi-LLM Collaborative Caption Generation in Scientific Documents
by: Kim, Jaeyoung, et al.
Published: (2025)
by: Kim, Jaeyoung, et al.
Published: (2025)
Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation
by: Sanguigni, Fulvio, et al.
Published: (2025)
by: Sanguigni, Fulvio, et al.
Published: (2025)
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
Retrieval-Augmented Egocentric Video Captioning
by: Xu, Jilan, et al.
Published: (2024)
by: Xu, Jilan, et al.
Published: (2024)
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
by: Gwilliam, Matthew, et al.
Published: (2023)
by: Gwilliam, Matthew, et al.
Published: (2023)
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
by: Lu, Songshuo, et al.
Published: (2024)
by: Lu, Songshuo, et al.
Published: (2024)
AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
by: Wang, Zhengren, et al.
Published: (2026)
by: Wang, Zhengren, et al.
Published: (2026)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
by: Gao, Sensen, et al.
Published: (2025)
by: Gao, Sensen, et al.
Published: (2025)
Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)
by: Shih, Yu-Fei, et al.
Published: (2025)
From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models
by: Yang, Cheng, et al.
Published: (2026)
by: Yang, Cheng, et al.
Published: (2026)
Open Vocabulary Panoptic Segmentation With Retrieval Augmentation
by: Sadeq, Nafis, et al.
Published: (2026)
by: Sadeq, Nafis, et al.
Published: (2026)
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning
by: Chaffin, Antoine, et al.
Published: (2024)
by: Chaffin, Antoine, et al.
Published: (2024)
Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation
by: Zhao, Shu, et al.
Published: (2025)
by: Zhao, Shu, et al.
Published: (2025)
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
by: Wu, Yin, et al.
Published: (2025)
by: Wu, Yin, et al.
Published: (2025)
PixelWorld: How Far Are We from Perceiving Everything as Pixels?
by: Lyu, Zhiheng, et al.
Published: (2025)
by: Lyu, Zhiheng, et al.
Published: (2025)
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
by: Shang, Yuying, et al.
Published: (2024)
by: Shang, Yuying, et al.
Published: (2024)
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)
by: Cheng, Sheng, et al.
Published: (2024)
Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts
by: Xu, Run, et al.
Published: (2026)
by: Xu, Run, et al.
Published: (2026)
CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation
by: Yayavaram, Arnav, et al.
Published: (2025)
by: Yayavaram, Arnav, et al.
Published: (2025)
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph
by: Wang, Qiuchen, et al.
Published: (2026)
by: Wang, Qiuchen, et al.
Published: (2026)
VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation
by: Sun, Yubo, et al.
Published: (2025)
by: Sun, Yubo, et al.
Published: (2025)
Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
by: Shen, Wenxuan, et al.
Published: (2025)
by: Shen, Wenxuan, et al.
Published: (2025)
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
Similar Items
-
Beyond the Hype: Comparing Lightweight and Deep Learning Models for Air Quality Forecasting
by: Gondal, Moazzam Umer, et al.
Published: (2025) -
Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models
by: Gondal, Moazzam Umer, et al.
Published: (2026) -
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
by: Li, Wenyan, et al.
Published: (2024) -
Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
by: Fonseca, Rui, et al.
Published: (2025) -
From Pixels to Prose: A Large Dataset of Dense Image Captions
by: Singla, Vasu, et al.
Published: (2024)