:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nguyen, Viet, Nguyen, Thao, Patel, Vishal M., Li, Yuheng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Computation and Language Information Retrieval
Online Access:	https://arxiv.org/abs/2605.28806
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)

Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
by: Xu, Yexing, et al.
Published: (2026)

Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026)

Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
by: Xu, Tao
Published: (2026)

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)

Attention Grounded Enhancement for Visual Document Retrieval
by: Cui, Wanqing, et al.
Published: (2025)

Personalized Multimodal Large Language Models: A Survey
by: Wu, Junda, et al.
Published: (2024)

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
by: Zhou, Junjie, et al.
Published: (2024)

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025)

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)

Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
by: Yan, Yibo, et al.
Published: (2026)

KiseKloset for Fashion Retrieval and Recommendation
by: Phan-Nguyen, Thanh-Tung, et al.
Published: (2025)

Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing
by: Song, Tingyu, et al.
Published: (2026)

ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
by: Zou, Henry Peng, et al.
Published: (2024)

Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
by: Luo, Weiqing, et al.
Published: (2026)

Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
by: Dai, Ziqi, et al.
Published: (2025)

Towards Text-Image Interleaved Retrieval
by: Zhang, Xin, et al.
Published: (2025)

Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
by: Hou, Bohan, et al.
Published: (2026)

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
by: Guo, Hao, et al.
Published: (2025)

Learning Visual Composition through Improved Semantic Guidance
by: Stone, Austin, et al.
Published: (2024)

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
by: Tanaka, Ryota, et al.
Published: (2025)

LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
by: Narayan, Kartik, et al.
Published: (2025)

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)

Indexing Multimodal Language Models for Large-scale Image Retrieval
by: Tharwat, Bahey, et al.
Published: (2026)

Multi-Vector Index Compression in Any Modality
by: Qin, Hanxiang, et al.
Published: (2026)

ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System
by: Cha, Sungguk, et al.
Published: (2026)

Efficient and High-Fidelity Omni Modality Retrieval
by: Huynh, Chuong, et al.
Published: (2026)

Improving Applicability of Deep Learning based Token Classification models during Training
by: Mehra, Anket, et al.
Published: (2025)

ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
by: Liao, Baohao, et al.
Published: (2023)

Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation
by: Zhao, Shu, et al.
Published: (2025)

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)

E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)

Large Language Model Informed Patent Image Retrieval
by: Lo, Hao-Cheng, et al.
Published: (2024)

CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections
by: Schneider, Florian, et al.
Published: (2025)

TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
by: Shankarampeta, Abhilash, et al.
Published: (2025)