:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Rong, Liu, Hao, Hou, Song
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2604.04997
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
by: Yu, Shi, et al.
Published: (2024)

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)

Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express
by: Aroraa, Cherag, et al.
Published: (2024)

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
by: Tanaka, Ryota, et al.
Published: (2025)

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
by: Wang, Qiuchen, et al.
Published: (2025)

MMDocIR: Benchmarking Multimodal Retrieval for Long Documents
by: Dong, Kuicai, et al.
Published: (2025)

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
by: Guo, Hao, et al.
Published: (2025)

Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
by: Dong, Kuicai, et al.
Published: (2025)

M3DR: Towards Universal Multilingual Multimodal Document Retrieval
by: Kolavi, Adithya S, et al.
Published: (2025)

Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
by: Zhan, Jingtao, et al.
Published: (2024)

Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
by: Zhu, Jing, et al.
Published: (2025)

Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)

DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph
by: Yang, Mengzheng, et al.
Published: (2025)

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
by: Jiang, Dongzhi, et al.
Published: (2024)

E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents
by: Pala, Furkan, et al.
Published: (2024)

From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
by: Rizk, Basem, et al.
Published: (2025)

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
by: Zhou, Junjie, et al.
Published: (2024)

Self Knowledge Re-expression: A Fully Local Method for Adapting LLMs to Tasks Using Intrinsic Knowledge
by: Wang, Mengyu, et al.
Published: (2026)

Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
by: Shen, Wenxuan, et al.
Published: (2025)

PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration
by: Akbar, Abdul Rehman, et al.
Published: (2026)

XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags
by: Shohan, Faisal Tareque, et al.
Published: (2024)

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
by: Lin, Sheng-Chieh, et al.
Published: (2024)

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)

TabRAG: Improving Tabular Document Question Answering for Retrieval Augmented Generation via Structured Representations
by: Si, Jacob, et al.
Published: (2025)

ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
by: Liao, Baohao, et al.
Published: (2023)

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
by: Hou, Bohan, et al.
Published: (2026)

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)

ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System
by: Cha, Sungguk, et al.
Published: (2026)

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
by: Xiao, Zilin, et al.
Published: (2025)

EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
by: Zou, Henry Peng, et al.
Published: (2024)

Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
by: Xu, Zhengfei, et al.
Published: (2024)

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing
by: Song, Tingyu, et al.
Published: (2026)

Evaluating Intelligence via Trial and Error
by: Zhan, Jingtao, et al.
Published: (2025)

Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026)

Attention Grounded Enhancement for Visual Document Retrieval
by: Cui, Wanqing, et al.
Published: (2025)

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)