Saved in:
| Main Authors: | Ghosh, Rahul, Liu, Chun-Hao, Rele, Gaurav, Ravipati, Vidya Sagar, Aouad, Hazar |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.16984 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025)
by: Kong, Fanheng, et al.
Published: (2025)
Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications
by: Saraiva, Thaina, et al.
Published: (2024)
by: Saraiva, Thaina, et al.
Published: (2024)
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025)
by: Li, Haoxuan, et al.
Published: (2025)
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
by: Wu, Qiyu, et al.
Published: (2025)
by: Wu, Qiyu, et al.
Published: (2025)
Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)
by: Messina, Nicola, et al.
Published: (2024)
A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
by: Han, Haochen, et al.
Published: (2024)
by: Han, Haochen, et al.
Published: (2024)
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
Cross-Modal Retrieval with Cauchy-Schwarz Divergence
by: Zhang, Jiahao, et al.
Published: (2025)
by: Zhang, Jiahao, et al.
Published: (2025)
From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)
by: Wang, Zheng, et al.
Published: (2025)
Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)
by: Shih, Yu-Fei, et al.
Published: (2025)
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
by: Li, Yongqi, et al.
Published: (2024)
by: Li, Yongqi, et al.
Published: (2024)
Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective
by: Su, Taoyu, et al.
Published: (2025)
by: Su, Taoyu, et al.
Published: (2025)
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2022)
by: Fang, Xiang, et al.
Published: (2022)
Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning
by: Luo, Tianci, et al.
Published: (2026)
by: Luo, Tianci, et al.
Published: (2026)
InfoCIR: Multimedia Analysis for Composed Image Retrieval
by: Dravilas, Ioannis, et al.
Published: (2026)
by: Dravilas, Ioannis, et al.
Published: (2026)
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
by: Zou, Qiang, et al.
Published: (2025)
by: Zou, Qiang, et al.
Published: (2025)
DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph
by: Yang, Mengzheng, et al.
Published: (2025)
by: Yang, Mengzheng, et al.
Published: (2025)
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)
by: Ju, Yeong-Joon, et al.
Published: (2024)
Agentic Mixed-Source Multi-Modal Misinformation Detection with Adaptive Test-Time Scaling
by: Jiang, Wei, et al.
Published: (2026)
by: Jiang, Wei, et al.
Published: (2026)
Interactive Multi-Turn Retrieval for Health Videos
by: Wu, Chengzheng, et al.
Published: (2026)
by: Wu, Chengzheng, et al.
Published: (2026)
Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions
by: Wang, Tianshi, et al.
Published: (2023)
by: Wang, Tianshi, et al.
Published: (2023)
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)
by: Xiao, Jian, et al.
Published: (2024)
Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues
by: Fu, Junchen, et al.
Published: (2026)
by: Fu, Junchen, et al.
Published: (2026)
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
by: Li, Jun, et al.
Published: (2025)
by: Li, Jun, et al.
Published: (2025)
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)
by: Li, Po-han, et al.
Published: (2024)
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
by: Wei, Tianxin, et al.
Published: (2024)
by: Wei, Tianxin, et al.
Published: (2024)
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)
by: Ning, Hailong, et al.
Published: (2025)
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2025)
by: Xiao, Jian, et al.
Published: (2025)
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024)
by: Messina, Nicola, et al.
Published: (2024)
Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
by: Hu, Fan, et al.
Published: (2025)
by: Hu, Fan, et al.
Published: (2025)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)
by: Li, Jun, et al.
Published: (2026)
GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)
by: Li, Minghan, et al.
Published: (2026)
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)
by: Jiang, Haoyu, et al.
Published: (2024)
Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation
by: Yang, Jheng-Hong, et al.
Published: (2024)
by: Yang, Jheng-Hong, et al.
Published: (2024)
RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation
by: Tourani, Ali, et al.
Published: (2025)
by: Tourani, Ali, et al.
Published: (2025)
PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval
by: Duan, Yue, et al.
Published: (2024)
by: Duan, Yue, et al.
Published: (2024)
Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
by: Kim, Wongyu, et al.
Published: (2025)
by: Kim, Wongyu, et al.
Published: (2025)
Multimodal Misinformation Detection using Large Vision-Language Models
by: Tahmasebi, Sahar, et al.
Published: (2024)
by: Tahmasebi, Sahar, et al.
Published: (2024)
Similar Items
-
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025) -
Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications
by: Saraiva, Thaina, et al.
Published: (2024) -
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025) -
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
by: Wu, Qiyu, et al.
Published: (2025) -
Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)