Saved in:
| Main Authors: | Yang, Mengzheng, Ren, Yanfei, Opoku, David Osei, Li, Ruochang, Ren, Peng, Xing, Chunxiao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10467 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)
by: Wang, Zheng, et al.
Published: (2025)
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)
by: Jiang, Haoyu, et al.
Published: (2024)
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025)
by: Li, Haoxuan, et al.
Published: (2025)
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
by: Wu, Qiyu, et al.
Published: (2025)
by: Wu, Qiyu, et al.
Published: (2025)
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)
by: Li, Po-han, et al.
Published: (2024)
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025)
by: Kong, Fanheng, et al.
Published: (2025)
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
Multimodal Misinformation Detection using Large Vision-Language Models
by: Tahmasebi, Sahar, et al.
Published: (2024)
by: Tahmasebi, Sahar, et al.
Published: (2024)
Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering
by: Zhou, Ao, et al.
Published: (2025)
by: Zhou, Ao, et al.
Published: (2025)
A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
by: Han, Haochen, et al.
Published: (2024)
by: Han, Haochen, et al.
Published: (2024)
InfoCIR: Multimedia Analysis for Composed Image Retrieval
by: Dravilas, Ioannis, et al.
Published: (2026)
by: Dravilas, Ioannis, et al.
Published: (2026)
MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
A Survey of Multimodal Composite Editing and Retrieval
by: Li, Suyan, et al.
Published: (2024)
by: Li, Suyan, et al.
Published: (2024)
Multimodal Learned Sparse Retrieval for Image Suggestion
by: Nguyen, Thong, et al.
Published: (2024)
by: Nguyen, Thong, et al.
Published: (2024)
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)
by: Ju, Yeong-Joon, et al.
Published: (2024)
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
by: Li, Yongqi, et al.
Published: (2024)
by: Li, Yongqi, et al.
Published: (2024)
EventCast: Hybrid Demand Forecasting in E-Commerce with LLM-Based Event Knowledge
by: Hu, Congcong, et al.
Published: (2026)
by: Hu, Congcong, et al.
Published: (2026)
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2022)
by: Fang, Xiang, et al.
Published: (2022)
Breaking the Curse of Knowledge: Towards Effective Multimodal Recommendation using Knowledge Soft Integration
by: Ouyang, Kai, et al.
Published: (2023)
by: Ouyang, Kai, et al.
Published: (2023)
Verifying Cross-modal Entity Consistency in News using Vision-language Models
by: Tahmasebi, Sahar, et al.
Published: (2025)
by: Tahmasebi, Sahar, et al.
Published: (2025)
Interactive Multi-Turn Retrieval for Health Videos
by: Wu, Chengzheng, et al.
Published: (2026)
by: Wu, Chengzheng, et al.
Published: (2026)
Very Efficient Listwise Multimodal Reranking for Long Documents
by: Sun, Yiqun, et al.
Published: (2026)
by: Sun, Yiqun, et al.
Published: (2026)
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)
by: Xiao, Jian, et al.
Published: (2024)
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
by: Li, Jun, et al.
Published: (2025)
by: Li, Jun, et al.
Published: (2025)
U-Sticker: A Large-Scale Multi-Domain User Sticker Dataset for Retrieval and Personalization
by: Chee, Heng Er Metilda, et al.
Published: (2025)
by: Chee, Heng Er Metilda, et al.
Published: (2025)
HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression
by: Zhang, Chen, et al.
Published: (2025)
by: Zhang, Chen, et al.
Published: (2025)
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)
by: Ning, Hailong, et al.
Published: (2025)
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2025)
by: Xiao, Jian, et al.
Published: (2025)
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024)
by: Messina, Nicola, et al.
Published: (2024)
Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability
by: Zhou, Xin, et al.
Published: (2024)
by: Zhou, Xin, et al.
Published: (2024)
Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale
by: Pan, Yongsen, et al.
Published: (2026)
by: Pan, Yongsen, et al.
Published: (2026)
Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)
by: Messina, Nicola, et al.
Published: (2024)
The 2nd EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval
by: Fu, Junchen, et al.
Published: (2026)
by: Fu, Junchen, et al.
Published: (2026)
Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)
by: Li, Jun, et al.
Published: (2026)
Visual Lifelog Retrieval through Captioning-Enhanced Interpretation
by: Shih, Yu-Fei, et al.
Published: (2025)
by: Shih, Yu-Fei, et al.
Published: (2025)
GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)
by: Li, Minghan, et al.
Published: (2026)
NativE: Multi-modal Knowledge Graph Completion in the Wild
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning
by: Luo, Tianci, et al.
Published: (2026)
by: Luo, Tianci, et al.
Published: (2026)
Similar Items
-
From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025) -
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024) -
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025) -
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
by: Wu, Qiyu, et al.
Published: (2025) -
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)