:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Sungyeon, Zhu, Xinliang, Lin, Xiaofan, Bastan, Muhammet, Gray, Douglas, Kwak, Suha
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2503.19868
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning
by: Kim, Sungyeon, et al.
Published: (2023)

Smart Routing for Multimodal Video Retrieval: When to Search What
by: Rosa, Kevin Dela
Published: (2025)

MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising
by: Fu, Chenghan, et al.
Published: (2025)

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
by: Zhang, Zhixin, et al.
Published: (2024)

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
by: Luo, Linyin, et al.
Published: (2025)

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search
by: Chen, Lei, et al.
Published: (2026)

Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation
by: Lin, Chengzhi, et al.
Published: (2024)

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
by: Lin, Sheng-Chieh, et al.
Published: (2024)

Scale Up Composed Image Retrieval Learning via Modification Text Generation
by: Zhou, Yinan, et al.
Published: (2025)

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)

Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
by: Zhu, Jing, et al.
Published: (2025)

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)

Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)

Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
by: Yin, Xinlei, et al.
Published: (2026)

M3DR: Towards Universal Multilingual Multimodal Document Retrieval
by: Kolavi, Adithya S, et al.
Published: (2025)

LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)

Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems
by: Zhang, Tuo, et al.
Published: (2025)

Good Scores, Bad Data: A Metric for Multimodal Coherence
by: Srinivasan, Vasundra
Published: (2026)

OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
by: Yang, Wei, et al.
Published: (2025)

DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
by: Yang, Yuxin, et al.
Published: (2025)

Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
by: Waseda, Futa, et al.
Published: (2024)

Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items
by: Lin, Jianghao, et al.
Published: (2025)

ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
by: Chaubey, Ashutosh, et al.
Published: (2024)

TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025)

Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)

RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?
by: Ghosh, Arijit, et al.
Published: (2026)

VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
by: Giahi, Ramin, et al.
Published: (2025)

Open Multimodal Retrieval-Augmented Factual Image Generation
by: Tian, Yang, et al.
Published: (2025)

Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
by: Thanh, Toan Le Ngo, et al.
Published: (2025)

HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)

Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)

From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
by: Rizk, Basem, et al.
Published: (2025)

ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization
by: Guo, Yuanhe, et al.
Published: (2025)

A Survey of Multimodal Composite Editing and Retrieval
by: Li, Suyan, et al.
Published: (2024)

Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
by: Wang, Hao, et al.
Published: (2024)

V-Agent: An Interactive Video Search System Using Vision-Language Models
by: Park, SunYoung, et al.
Published: (2025)

The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding
by: Rossetto, Luca, et al.
Published: (2025)

Very Efficient Listwise Multimodal Reranking for Long Documents
by: Sun, Yiqun, et al.
Published: (2026)

LookSync: Large-Scale Visual Product Search System for AI-Generated Fashion Looks
by: M, Pradeep, et al.
Published: (2025)

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)