Saved in:
| Main Authors: | Nguyen, Viet, Nguyen, Thao, Patel, Vishal M., Li, Yuheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28806 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)
by: Guo, Minghao, et al.
Published: (2026)
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
by: Xu, Yexing, et al.
Published: (2026)
by: Xu, Yexing, et al.
Published: (2026)
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026)
by: Yan, Yibo, et al.
Published: (2026)
Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
by: Xu, Tao
Published: (2026)
by: Xu, Tao
Published: (2026)
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)
by: Liu, Peiyang, et al.
Published: (2026)
Attention Grounded Enhancement for Visual Document Retrieval
by: Cui, Wanqing, et al.
Published: (2025)
by: Cui, Wanqing, et al.
Published: (2025)
Personalized Multimodal Large Language Models: A Survey
by: Wu, Junda, et al.
Published: (2024)
by: Wu, Junda, et al.
Published: (2024)
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
by: Zhou, Junjie, et al.
Published: (2024)
by: Zhou, Junjie, et al.
Published: (2024)
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025)
by: Long, Xinwei, et al.
Published: (2025)
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
by: Yan, Yibo, et al.
Published: (2026)
by: Yan, Yibo, et al.
Published: (2026)
KiseKloset for Fashion Retrieval and Recommendation
by: Phan-Nguyen, Thanh-Tung, et al.
Published: (2025)
by: Phan-Nguyen, Thanh-Tung, et al.
Published: (2025)
Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)
by: Shih, Yu-Fei, et al.
Published: (2025)
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing
by: Song, Tingyu, et al.
Published: (2026)
by: Song, Tingyu, et al.
Published: (2026)
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
by: Zou, Henry Peng, et al.
Published: (2024)
by: Zou, Henry Peng, et al.
Published: (2024)
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
by: Luo, Weiqing, et al.
Published: (2026)
by: Luo, Weiqing, et al.
Published: (2026)
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
by: Dai, Ziqi, et al.
Published: (2025)
by: Dai, Ziqi, et al.
Published: (2025)
Towards Text-Image Interleaved Retrieval
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
by: Hou, Bohan, et al.
Published: (2026)
by: Hou, Bohan, et al.
Published: (2026)
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
by: Guo, Hao, et al.
Published: (2025)
by: Guo, Hao, et al.
Published: (2025)
Learning Visual Composition through Improved Semantic Guidance
by: Stone, Austin, et al.
Published: (2024)
by: Stone, Austin, et al.
Published: (2024)
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
by: Singh, Shubhankar, et al.
Published: (2024)
by: Singh, Shubhankar, et al.
Published: (2024)
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
by: Tanaka, Ryota, et al.
Published: (2025)
by: Tanaka, Ryota, et al.
Published: (2025)
LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)
by: Kim, Seonok
Published: (2026)
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
by: Narayan, Kartik, et al.
Published: (2025)
by: Narayan, Kartik, et al.
Published: (2025)
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)
by: Zhang, Longxiang, et al.
Published: (2026)
Indexing Multimodal Language Models for Large-scale Image Retrieval
by: Tharwat, Bahey, et al.
Published: (2026)
by: Tharwat, Bahey, et al.
Published: (2026)
Multi-Vector Index Compression in Any Modality
by: Qin, Hanxiang, et al.
Published: (2026)
by: Qin, Hanxiang, et al.
Published: (2026)
ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System
by: Cha, Sungguk, et al.
Published: (2026)
by: Cha, Sungguk, et al.
Published: (2026)
Efficient and High-Fidelity Omni Modality Retrieval
by: Huynh, Chuong, et al.
Published: (2026)
by: Huynh, Chuong, et al.
Published: (2026)
Improving Applicability of Deep Learning based Token Classification models during Training
by: Mehra, Anket, et al.
Published: (2025)
by: Mehra, Anket, et al.
Published: (2025)
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
by: Liao, Baohao, et al.
Published: (2023)
by: Liao, Baohao, et al.
Published: (2023)
Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation
by: Zhao, Shu, et al.
Published: (2025)
by: Zhao, Shu, et al.
Published: (2025)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)
by: Jiang, Ting, et al.
Published: (2024)
Large Language Model Informed Patent Image Retrieval
by: Lo, Hao-Cheng, et al.
Published: (2024)
by: Lo, Hao-Cheng, et al.
Published: (2024)
CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections
by: Schneider, Florian, et al.
Published: (2025)
by: Schneider, Florian, et al.
Published: (2025)
TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
by: Shankarampeta, Abhilash, et al.
Published: (2025)
by: Shankarampeta, Abhilash, et al.
Published: (2025)
Similar Items
-
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026) -
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
by: Xu, Yexing, et al.
Published: (2026) -
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
by: Yan, Yibo, et al.
Published: (2026) -
Index Light, Reason Deep: Deferred Visual Ingestion for Visual-Dense Document Question Answering
by: Xu, Tao
Published: (2026) -
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
by: Liu, Peiyang, et al.
Published: (2026)