:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ghosh, Rahul, Liu, Chun-Hao, Rele, Gaurav, Ravipati, Vidya Sagar, Aouad, Hazar
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Information Retrieval Multimedia
Online Access:	https://arxiv.org/abs/2601.16984
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025)

Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications
by: Saraiva, Thaina, et al.
Published: (2024)

SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025)

MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
by: Wu, Qiyu, et al.
Published: (2025)

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)

A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
by: Han, Haochen, et al.
Published: (2024)

Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)

Cross-Modal Retrieval with Cauchy-Schwarz Divergence
by: Zhang, Jiahao, et al.
Published: (2025)

From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)

Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)

Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
by: Li, Yongqi, et al.
Published: (2024)

Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective
by: Su, Taoyu, et al.
Published: (2025)

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2022)

Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning
by: Luo, Tianci, et al.
Published: (2026)

InfoCIR: Multimedia Analysis for Composed Image Retrieval
by: Dravilas, Ioannis, et al.
Published: (2026)

PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
by: Zou, Qiang, et al.
Published: (2025)

DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge Graph
by: Yang, Mengzheng, et al.
Published: (2025)

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)

Agentic Mixed-Source Multi-Modal Misinformation Detection with Adaptive Test-Time Scaling
by: Jiang, Wei, et al.
Published: (2026)

Interactive Multi-Turn Retrieval for Health Videos
by: Wu, Chengzheng, et al.
Published: (2026)

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions
by: Wang, Tianshi, et al.
Published: (2023)

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues
by: Fu, Junchen, et al.
Published: (2026)

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
by: Li, Jun, et al.
Published: (2025)

Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)

Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
by: Wei, Tianxin, et al.
Published: (2024)

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2025)

Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024)

Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
by: Wu, Yiming, et al.
Published: (2024)

Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
by: Hu, Fan, et al.
Published: (2025)

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)

GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)

Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation
by: Yang, Jheng-Hong, et al.
Published: (2024)

RAG-VisualRec: An Open Resource for Vision- and Text-Enhanced Retrieval-Augmented Generation in Recommendation
by: Tourani, Ali, et al.
Published: (2025)

PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval
by: Duan, Yue, et al.
Published: (2024)

Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
by: Kim, Wongyu, et al.
Published: (2025)

Multimodal Misinformation Detection using Large Vision-Language Models
by: Tahmasebi, Sahar, et al.
Published: (2024)