:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Mengzheng, Ren, Yanfei, Opoku, David Osei, Li, Ruochang, Ren, Peng, Xing, Chunxiao
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Multimedia
Online Access:	https://arxiv.org/abs/2509.10467
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)

SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025)

MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
by: Wu, Qiyu, et al.
Published: (2025)

Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025)

Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)

Multimodal Misinformation Detection using Large Vision-Language Models
by: Tahmasebi, Sahar, et al.
Published: (2024)

Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering
by: Zhou, Ao, et al.
Published: (2025)

A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
by: Han, Haochen, et al.
Published: (2024)

InfoCIR: Multimedia Analysis for Composed Image Retrieval
by: Dravilas, Ioannis, et al.
Published: (2026)

MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model
by: Wang, Haoyu, et al.
Published: (2025)

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task
by: Deng, Jiaqi, et al.
Published: (2025)

A Survey of Multimodal Composite Editing and Retrieval
by: Li, Suyan, et al.
Published: (2024)

Multimodal Learned Sparse Retrieval for Image Suggestion
by: Nguyen, Thong, et al.
Published: (2024)

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)

Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
by: Li, Yongqi, et al.
Published: (2024)

EventCast: Hybrid Demand Forecasting in E-Commerce with LLM-Based Event Knowledge
by: Hu, Congcong, et al.
Published: (2026)

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2022)

Breaking the Curse of Knowledge: Towards Effective Multimodal Recommendation using Knowledge Soft Integration
by: Ouyang, Kai, et al.
Published: (2023)

Verifying Cross-modal Entity Consistency in News using Vision-language Models
by: Tahmasebi, Sahar, et al.
Published: (2025)

Interactive Multi-Turn Retrieval for Health Videos
by: Wu, Chengzheng, et al.
Published: (2026)

Very Efficient Listwise Multimodal Reranking for Long Documents
by: Sun, Yiqun, et al.
Published: (2026)

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
by: Li, Jun, et al.
Published: (2025)

U-Sticker: A Large-Scale Multi-Domain User Sticker Dataset for Retrieval and Personalization
by: Chee, Heng Er Metilda, et al.
Published: (2025)

HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
by: Zhang, Chen, et al.
Published: (2025)

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2025)

Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024)

Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
by: Wu, Yiming, et al.
Published: (2024)

Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability
by: Zhou, Xin, et al.
Published: (2024)

Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale
by: Pan, Yongsen, et al.
Published: (2026)

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)

The 2nd EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval
by: Fu, Junchen, et al.
Published: (2026)

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)

Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)

GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)

NativE: Multi-modal Knowledge Graph Completion in the Wild
by: Zhang, Yichi, et al.
Published: (2024)

Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning
by: Luo, Tianci, et al.
Published: (2026)