Saved in:
| Main Authors: | Lu, Rong, Liu, Hao, Hou, Song |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04997 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
by: Yu, Shi, et al.
Published: (2024)
by: Yu, Shi, et al.
Published: (2024)
MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)
by: Zhang, Longxiang, et al.
Published: (2026)
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express
by: Aroraa, Cherag, et al.
Published: (2024)
by: Aroraa, Cherag, et al.
Published: (2024)
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
by: Tanaka, Ryota, et al.
Published: (2025)
by: Tanaka, Ryota, et al.
Published: (2025)
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
by: Wang, Qiuchen, et al.
Published: (2025)
by: Wang, Qiuchen, et al.
Published: (2025)
MMDocIR: Benchmarking Multimodal Retrieval for Long Documents
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
by: Guo, Hao, et al.
Published: (2025)
by: Guo, Hao, et al.
Published: (2025)
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
M3DR: Towards Universal Multilingual Multimodal Document Retrieval
by: Kolavi, Adithya S, et al.
Published: (2025)
by: Kolavi, Adithya S, et al.
Published: (2025)
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
by: Zhan, Jingtao, et al.
Published: (2024)
by: Zhan, Jingtao, et al.
Published: (2024)
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
by: Zhu, Jing, et al.
Published: (2025)
by: Zhu, Jing, et al.
Published: (2025)
Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)
by: Dong, Guanting, et al.
Published: (2024)
DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph
by: Yang, Mengzheng, et al.
Published: (2025)
by: Yang, Mengzheng, et al.
Published: (2025)
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)
by: Liu, Peiyang, et al.
Published: (2026)
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
by: Jiang, Dongzhi, et al.
Published: (2024)
by: Jiang, Dongzhi, et al.
Published: (2024)
E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)
by: Jiang, Ting, et al.
Published: (2024)
ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents
by: Pala, Furkan, et al.
Published: (2024)
by: Pala, Furkan, et al.
Published: (2024)
From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
by: Rizk, Basem, et al.
Published: (2025)
by: Rizk, Basem, et al.
Published: (2025)
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)
by: Guo, Zhuoning, et al.
Published: (2025)
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
by: Zhou, Junjie, et al.
Published: (2024)
by: Zhou, Junjie, et al.
Published: (2024)
Self Knowledge Re-expression: A Fully Local Method for Adapting LLMs to Tasks Using Intrinsic Knowledge
by: Wang, Mengyu, et al.
Published: (2026)
by: Wang, Mengyu, et al.
Published: (2026)
Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
by: Shen, Wenxuan, et al.
Published: (2025)
by: Shen, Wenxuan, et al.
Published: (2025)
PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration
by: Akbar, Abdul Rehman, et al.
Published: (2026)
by: Akbar, Abdul Rehman, et al.
Published: (2026)
XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags
by: Shohan, Faisal Tareque, et al.
Published: (2024)
by: Shohan, Faisal Tareque, et al.
Published: (2024)
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
by: Lin, Sheng-Chieh, et al.
Published: (2024)
by: Lin, Sheng-Chieh, et al.
Published: (2024)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
TabRAG: Improving Tabular Document Question Answering for Retrieval Augmented Generation via Structured Representations
by: Si, Jacob, et al.
Published: (2025)
by: Si, Jacob, et al.
Published: (2025)
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
by: Liao, Baohao, et al.
Published: (2023)
by: Liao, Baohao, et al.
Published: (2023)
InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
by: Hou, Bohan, et al.
Published: (2026)
by: Hou, Bohan, et al.
Published: (2026)
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System
by: Cha, Sungguk, et al.
Published: (2026)
by: Cha, Sungguk, et al.
Published: (2026)
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
by: Xiao, Zilin, et al.
Published: (2025)
by: Xiao, Zilin, et al.
Published: (2025)
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
by: Zou, Henry Peng, et al.
Published: (2024)
by: Zou, Henry Peng, et al.
Published: (2024)
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
by: Xu, Zhengfei, et al.
Published: (2024)
by: Xu, Zhengfei, et al.
Published: (2024)
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing
by: Song, Tingyu, et al.
Published: (2026)
by: Song, Tingyu, et al.
Published: (2026)
Evaluating Intelligence via Trial and Error
by: Zhan, Jingtao, et al.
Published: (2025)
by: Zhan, Jingtao, et al.
Published: (2025)
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026)
by: Yan, Yibo, et al.
Published: (2026)
Attention Grounded Enhancement for Visual Document Retrieval
by: Cui, Wanqing, et al.
Published: (2025)
by: Cui, Wanqing, et al.
Published: (2025)
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)
by: Guo, Minghao, et al.
Published: (2026)
Similar Items
-
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
by: Yu, Shi, et al.
Published: (2024) -
MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025) -
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026) -
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express
by: Aroraa, Cherag, et al.
Published: (2024) -
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
by: Tanaka, Ryota, et al.
Published: (2025)