:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shim, Alexander, Saieh, Khalil, Clarke, Samuel
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.08226
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG
by: Zhang, Zilun, et al.
Published: (2024)

Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval
by: Lim, Youngsun, et al.
Published: (2024)

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
by: Lim, Youngsun, et al.
Published: (2024)

mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
by: Yuan, Xu, et al.
Published: (2025)

Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)

Clustering-based Image-Text Graph Matching for Domain Generalization
by: Park, Nokyung, et al.
Published: (2023)

M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025)

RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition
by: Sivaroopan, Nirhoshan, et al.
Published: (2025)

MV-RAG: Retrieval Augmented Multiview Diffusion
by: Dayani, Yosef, et al.
Published: (2025)

Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering
by: Naeem, Awais, et al.
Published: (2024)

Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation
by: Sun, Xiatao, et al.
Published: (2025)

Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation
by: Sanguigni, Fulvio, et al.
Published: (2025)

QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
by: Jiang, Zhuohang, et al.
Published: (2025)

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)

Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework
by: Yang, Yuming, et al.
Published: (2025)

CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models
by: Alam, Hasan Md Tusfiqur, et al.
Published: (2025)

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
by: Luo, Yongdong, et al.
Published: (2024)

SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025)

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
by: Ha, Hyeonjeong, et al.
Published: (2025)

VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving
by: Zhao, Rui, et al.
Published: (2026)

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
by: Wang, Shuai, et al.
Published: (2025)

From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG
by: Han, Jiaju, et al.
Published: (2026)

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
by: Wang, Jun, et al.
Published: (2026)

MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems
by: Yang, Peiru, et al.
Published: (2025)

First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection
by: Liu, Wutao, et al.
Published: (2025)

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design
by: Tang, Wenxin, et al.
Published: (2025)

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
by: Xu, Haidong, et al.
Published: (2025)

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
by: Zheng, Qiaoyu, et al.
Published: (2025)

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
by: Fu, Honghao, et al.
Published: (2026)

FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation
by: Li, Gen, et al.
Published: (2026)

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
by: Yu, Shi, et al.
Published: (2024)

Towards Visual Text Design Transfer Across Languages
by: Choi, Yejin, et al.
Published: (2024)

When RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMs
by: Zhao, Beidi, et al.
Published: (2026)

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
by: Hu, Chan-Wei, et al.
Published: (2025)

Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems
by: Zhang, Tuo, et al.
Published: (2025)

World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge
by: Son, Moo Hyun, et al.
Published: (2025)

Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)

Multimodal RAG Enhanced Visual Description
by: Jaiswal, Amit Kumar, et al.
Published: (2025)

MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
by: Berman, William, et al.
Published: (2024)

CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval
by: Liu, Yating, et al.
Published: (2023)