Saved in:
| Main Authors: | Shim, Alexander, Saieh, Khalil, Clarke, Samuel |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08226 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG
by: Zhang, Zilun, et al.
Published: (2024)
by: Zhang, Zilun, et al.
Published: (2024)
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval
by: Lim, Youngsun, et al.
Published: (2024)
by: Lim, Youngsun, et al.
Published: (2024)
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
by: Lim, Youngsun, et al.
Published: (2024)
by: Lim, Youngsun, et al.
Published: (2024)
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
by: Yuan, Xu, et al.
Published: (2025)
by: Yuan, Xu, et al.
Published: (2025)
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)
by: Zhu, Mengdan, et al.
Published: (2025)
Clustering-based Image-Text Graph Matching for Domain Generalization
by: Park, Nokyung, et al.
Published: (2023)
by: Park, Nokyung, et al.
Published: (2023)
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025)
by: Anugraha, David, et al.
Published: (2025)
RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition
by: Sivaroopan, Nirhoshan, et al.
Published: (2025)
by: Sivaroopan, Nirhoshan, et al.
Published: (2025)
MV-RAG: Retrieval Augmented Multiview Diffusion
by: Dayani, Yosef, et al.
Published: (2025)
by: Dayani, Yosef, et al.
Published: (2025)
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering
by: Naeem, Awais, et al.
Published: (2024)
by: Naeem, Awais, et al.
Published: (2024)
Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation
by: Sun, Xiatao, et al.
Published: (2025)
by: Sun, Xiatao, et al.
Published: (2025)
Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation
by: Sanguigni, Fulvio, et al.
Published: (2025)
by: Sanguigni, Fulvio, et al.
Published: (2025)
QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering
by: Jiang, Zhuohang, et al.
Published: (2025)
by: Jiang, Zhuohang, et al.
Published: (2025)
MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework
by: Yang, Yuming, et al.
Published: (2025)
by: Yang, Yuming, et al.
Published: (2025)
CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models
by: Alam, Hasan Md Tusfiqur, et al.
Published: (2025)
by: Alam, Hasan Md Tusfiqur, et al.
Published: (2025)
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
by: Luo, Yongdong, et al.
Published: (2024)
by: Luo, Yongdong, et al.
Published: (2024)
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025)
by: Zeng, Nianbo, et al.
Published: (2025)
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
by: Ha, Hyeonjeong, et al.
Published: (2025)
by: Ha, Hyeonjeong, et al.
Published: (2025)
VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving
by: Zhao, Rui, et al.
Published: (2026)
by: Zhao, Rui, et al.
Published: (2026)
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG
by: Han, Jiaju, et al.
Published: (2026)
by: Han, Jiaju, et al.
Published: (2026)
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems
by: Yang, Peiru, et al.
Published: (2025)
by: Yang, Peiru, et al.
Published: (2025)
First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection
by: Liu, Wutao, et al.
Published: (2025)
by: Liu, Wutao, et al.
Published: (2025)
SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design
by: Tang, Wenxin, et al.
Published: (2025)
by: Tang, Wenxin, et al.
Published: (2025)
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
by: Xu, Haidong, et al.
Published: (2025)
by: Xu, Haidong, et al.
Published: (2025)
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
by: Zheng, Qiaoyu, et al.
Published: (2025)
by: Zheng, Qiaoyu, et al.
Published: (2025)
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
by: Fu, Honghao, et al.
Published: (2026)
by: Fu, Honghao, et al.
Published: (2026)
FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation
by: Li, Gen, et al.
Published: (2026)
by: Li, Gen, et al.
Published: (2026)
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
by: Yu, Shi, et al.
Published: (2024)
by: Yu, Shi, et al.
Published: (2024)
Towards Visual Text Design Transfer Across Languages
by: Choi, Yejin, et al.
Published: (2024)
by: Choi, Yejin, et al.
Published: (2024)
When RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMs
by: Zhao, Beidi, et al.
Published: (2026)
by: Zhao, Beidi, et al.
Published: (2026)
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
by: Hu, Chan-Wei, et al.
Published: (2025)
by: Hu, Chan-Wei, et al.
Published: (2025)
Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems
by: Zhang, Tuo, et al.
Published: (2025)
by: Zhang, Tuo, et al.
Published: (2025)
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge
by: Son, Moo Hyun, et al.
Published: (2025)
by: Son, Moo Hyun, et al.
Published: (2025)
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)
by: Yan, Peizheng, et al.
Published: (2026)
Multimodal RAG Enhanced Visual Description
by: Jaiswal, Amit Kumar, et al.
Published: (2025)
by: Jaiswal, Amit Kumar, et al.
Published: (2025)
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
by: Berman, William, et al.
Published: (2024)
by: Berman, William, et al.
Published: (2024)
CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval
by: Liu, Yating, et al.
Published: (2023)
by: Liu, Yating, et al.
Published: (2023)
Similar Items
-
ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG
by: Zhang, Zilun, et al.
Published: (2024) -
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval
by: Lim, Youngsun, et al.
Published: (2024) -
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
by: Lim, Youngsun, et al.
Published: (2024) -
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
by: Yuan, Xu, et al.
Published: (2025) -
Cross-modal RAG: Sub-dimensional Text-to-Image Retrieval-Augmented Generation
by: Zhu, Mengdan, et al.
Published: (2025)