:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Wenyi, Jia, Ju, Jia, Xiaojun, Huang, Yihao, Li, Xinfeng, Wu, Cong, Wang, Lina
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.11509
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)

Personalized Video Summarization using Text-Based Queries and Conditional Modeling
by: Huang, Jia-Hong
Published: (2024)

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
by: Fu, Junchen, et al.
Published: (2024)

FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
by: Feng, Kaidong, et al.
Published: (2026)

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
by: Narayan, Kartik, et al.
Published: (2025)

ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence
by: Shi, Zhuofan, et al.
Published: (2026)

Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
by: Zhang, Jiahao, et al.
Published: (2026)

Adapting MLLMs for Nuanced Video Retrieval
by: Bagad, Piyush, et al.
Published: (2025)

LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2025)

X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation
by: Lyu, Hanjia, et al.
Published: (2024)

From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)

A Novel Evaluation Framework for Image2Text Generation
by: Huang, Jia-Hong, et al.
Published: (2024)

GraphRevisedIE: Multimodal Information Extraction with Graph-Revised Network
by: Cao, Panfeng, et al.
Published: (2024)

Automatic Creative Selection with Cross-Modal Matching
by: Kim, Alex, et al.
Published: (2024)

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
by: Deng, Chenlong, et al.
Published: (2026)

Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval
by: Lu, Xuan, et al.
Published: (2026)

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)

Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
by: Zhan, Jingtao, et al.
Published: (2024)

Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
by: Nozawa, Yuji, et al.
Published: (2025)

The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding
by: Rossetto, Luca, et al.
Published: (2025)

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)

Joint graph entropy knowledge distillation for point cloud classification and robustness against corruptions
by: Tian, Zhiqiang, et al.
Published: (2025)

Entity Image and Mixed-Modal Image Retrieval Datasets
by: Blaga, Cristian-Ioan, et al.
Published: (2025)

AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction
by: Yang, Jiashu, et al.
Published: (2026)

MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
by: Samuel, Saron, et al.
Published: (2025)

Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
by: Nguyen, Thong, et al.
Published: (2024)

ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
by: Mao, Chen, et al.
Published: (2024)

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)

Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
by: Li, Da, et al.
Published: (2025)

Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
by: Fu, Junchen, et al.
Published: (2024)

Multimodal Language Models for Domain-Specific Procedural Video Summarization
by: Hussain, Nafisa
Published: (2024)

LOVO: Efficient Complex Object Query in Large-Scale Video Datasets
by: Liu, Yuxin, et al.
Published: (2025)

FIGROTD: A Friendly-to-Handle Dataset for Image Guided Retrieval with Optional Text
by: Le, Hoang-Bao, et al.
Published: (2025)

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking
by: Zhao, Jinghan, et al.
Published: (2026)

BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
by: Mounis, Mohamed Darwish, et al.
Published: (2026)

LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts
by: Cai, Qifeng, et al.
Published: (2025)

MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
by: Zhou, Junjie, et al.
Published: (2025)

Prototype-Driven Structure Synergy Network for Remote Sensing Images Segmentation
by: Wang, Junyi, et al.
Published: (2025)

Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild
by: Wei, Tianqi, et al.
Published: (2024)