Saved in:
| Main Authors: | Kim, Sungyeon, Zhu, Xinliang, Lin, Xiaofan, Bastan, Muhammet, Gray, Douglas, Kwak, Suha |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.19868 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning
by: Kim, Sungyeon, et al.
Published: (2023)
by: Kim, Sungyeon, et al.
Published: (2023)
Smart Routing for Multimodal Video Retrieval: When to Search What
by: Rosa, Kevin Dela
Published: (2025)
by: Rosa, Kevin Dela
Published: (2025)
MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising
by: Fu, Chenghan, et al.
Published: (2025)
by: Fu, Chenghan, et al.
Published: (2025)
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
by: Zhang, Zhixin, et al.
Published: (2024)
by: Zhang, Zhixin, et al.
Published: (2024)
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
by: Luo, Linyin, et al.
Published: (2025)
by: Luo, Linyin, et al.
Published: (2025)
Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search
by: Chen, Lei, et al.
Published: (2026)
by: Chen, Lei, et al.
Published: (2026)
Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation
by: Lin, Chengzhi, et al.
Published: (2024)
by: Lin, Chengzhi, et al.
Published: (2024)
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
by: Lin, Sheng-Chieh, et al.
Published: (2024)
by: Lin, Sheng-Chieh, et al.
Published: (2024)
Scale Up Composed Image Retrieval Learning via Modification Text Generation
by: Zhou, Yinan, et al.
Published: (2025)
by: Zhou, Yinan, et al.
Published: (2025)
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)
by: Ju, Yeong-Joon, et al.
Published: (2024)
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
by: Zhu, Jing, et al.
Published: (2025)
by: Zhu, Jing, et al.
Published: (2025)
MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
by: Yin, Xinlei, et al.
Published: (2026)
by: Yin, Xinlei, et al.
Published: (2026)
M3DR: Towards Universal Multilingual Multimodal Document Retrieval
by: Kolavi, Adithya S, et al.
Published: (2025)
by: Kolavi, Adithya S, et al.
Published: (2025)
LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)
by: Kim, Seonok
Published: (2026)
Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems
by: Zhang, Tuo, et al.
Published: (2025)
by: Zhang, Tuo, et al.
Published: (2025)
Good Scores, Bad Data: A Metric for Multimodal Coherence
by: Srinivasan, Vasundra
Published: (2026)
by: Srinivasan, Vasundra
Published: (2026)
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
by: Yang, Wei, et al.
Published: (2025)
by: Yang, Wei, et al.
Published: (2025)
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
by: Yang, Yuxin, et al.
Published: (2025)
by: Yang, Yuxin, et al.
Published: (2025)
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
by: Waseda, Futa, et al.
Published: (2024)
by: Waseda, Futa, et al.
Published: (2024)
Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items
by: Lin, Jianghao, et al.
Published: (2025)
by: Lin, Jianghao, et al.
Published: (2025)
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
by: Chaubey, Ashutosh, et al.
Published: (2024)
by: Chaubey, Ashutosh, et al.
Published: (2024)
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025)
by: Mannam, Varun, et al.
Published: (2025)
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)
by: Zhang, Jinxu
Published: (2024)
RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?
by: Ghosh, Arijit, et al.
Published: (2026)
by: Ghosh, Arijit, et al.
Published: (2026)
VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
by: Giahi, Ramin, et al.
Published: (2025)
by: Giahi, Ramin, et al.
Published: (2025)
Open Multimodal Retrieval-Augmented Factual Image Generation
by: Tian, Yang, et al.
Published: (2025)
by: Tian, Yang, et al.
Published: (2025)
Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
by: Thanh, Toan Le Ngo, et al.
Published: (2025)
by: Thanh, Toan Le Ngo, et al.
Published: (2025)
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)
by: Askari, Arian, et al.
Published: (2025)
Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)
by: Dong, Guanting, et al.
Published: (2024)
From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
by: Rizk, Basem, et al.
Published: (2025)
by: Rizk, Basem, et al.
Published: (2025)
ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization
by: Guo, Yuanhe, et al.
Published: (2025)
by: Guo, Yuanhe, et al.
Published: (2025)
A Survey of Multimodal Composite Editing and Retrieval
by: Li, Suyan, et al.
Published: (2024)
by: Li, Suyan, et al.
Published: (2024)
Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
V-Agent: An Interactive Video Search System Using Vision-Language Models
by: Park, SunYoung, et al.
Published: (2025)
by: Park, SunYoung, et al.
Published: (2025)
The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding
by: Rossetto, Luca, et al.
Published: (2025)
by: Rossetto, Luca, et al.
Published: (2025)
Very Efficient Listwise Multimodal Reranking for Long Documents
by: Sun, Yiqun, et al.
Published: (2026)
by: Sun, Yiqun, et al.
Published: (2026)
LookSync: Large-Scale Visual Product Search System for AI-Generated Fashion Looks
by: M, Pradeep, et al.
Published: (2025)
by: M, Pradeep, et al.
Published: (2025)
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)
by: Guo, Zhuoning, et al.
Published: (2025)
Similar Items
-
Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning
by: Kim, Sungyeon, et al.
Published: (2023) -
Smart Routing for Multimodal Video Retrieval: When to Search What
by: Rosa, Kevin Dela
Published: (2025) -
MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising
by: Fu, Chenghan, et al.
Published: (2025) -
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
by: Zhang, Zhixin, et al.
Published: (2024) -
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
by: Luo, Linyin, et al.
Published: (2025)