Saved in:
| Main Authors: | Zhang, Wenyi, Jia, Ju, Jia, Xiaojun, Huang, Yihao, Li, Xinfeng, Wu, Cong, Wang, Lina |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.11509 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Personalized Video Summarization using Text-Based Queries and Conditional Modeling
by: Huang, Jia-Hong
Published: (2024)
by: Huang, Jia-Hong
Published: (2024)
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
by: Fu, Junchen, et al.
Published: (2024)
by: Fu, Junchen, et al.
Published: (2024)
FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
by: Feng, Kaidong, et al.
Published: (2026)
by: Feng, Kaidong, et al.
Published: (2026)
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
by: Narayan, Kartik, et al.
Published: (2025)
by: Narayan, Kartik, et al.
Published: (2025)
ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence
by: Shi, Zhuofan, et al.
Published: (2026)
by: Shi, Zhuofan, et al.
Published: (2026)
Pretrain-then-Adapt: Uncertainty-Aware Test-Time Adaptation for Text-based Person Search
by: Zhang, Jiahao, et al.
Published: (2026)
by: Zhang, Jiahao, et al.
Published: (2026)
Adapting MLLMs for Nuanced Video Retrieval
by: Bagad, Piyush, et al.
Published: (2025)
by: Bagad, Piyush, et al.
Published: (2025)
LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2025)
by: Wu, Wangyu, et al.
Published: (2025)
X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation
by: Lyu, Hanjia, et al.
Published: (2024)
by: Lyu, Hanjia, et al.
Published: (2024)
From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)
by: Wang, Zheng, et al.
Published: (2025)
A Novel Evaluation Framework for Image2Text Generation
by: Huang, Jia-Hong, et al.
Published: (2024)
by: Huang, Jia-Hong, et al.
Published: (2024)
GraphRevisedIE: Multimodal Information Extraction with Graph-Revised Network
by: Cao, Panfeng, et al.
Published: (2024)
by: Cao, Panfeng, et al.
Published: (2024)
Automatic Creative Selection with Cross-Modal Matching
by: Kim, Alex, et al.
Published: (2024)
by: Kim, Alex, et al.
Published: (2024)
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)
by: Tu, Rong-Cheng, et al.
Published: (2025)
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
by: Deng, Chenlong, et al.
Published: (2026)
by: Deng, Chenlong, et al.
Published: (2026)
Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval
by: Lu, Xuan, et al.
Published: (2026)
by: Lu, Xuan, et al.
Published: (2026)
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)
by: Ju, Yeong-Joon, et al.
Published: (2024)
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
by: Zhan, Jingtao, et al.
Published: (2024)
by: Zhan, Jingtao, et al.
Published: (2024)
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
by: Nozawa, Yuji, et al.
Published: (2025)
by: Nozawa, Yuji, et al.
Published: (2025)
The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding
by: Rossetto, Luca, et al.
Published: (2025)
by: Rossetto, Luca, et al.
Published: (2025)
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)
by: Xiao, Jian, et al.
Published: (2024)
Joint graph entropy knowledge distillation for point cloud classification and robustness against corruptions
by: Tian, Zhiqiang, et al.
Published: (2025)
by: Tian, Zhiqiang, et al.
Published: (2025)
Entity Image and Mixed-Modal Image Retrieval Datasets
by: Blaga, Cristian-Ioan, et al.
Published: (2025)
by: Blaga, Cristian-Ioan, et al.
Published: (2025)
AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction
by: Yang, Jiashu, et al.
Published: (2026)
by: Yang, Jiashu, et al.
Published: (2026)
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
by: Samuel, Saron, et al.
Published: (2025)
by: Samuel, Saron, et al.
Published: (2025)
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
by: Nguyen, Thong, et al.
Published: (2024)
by: Nguyen, Thong, et al.
Published: (2024)
ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
by: Mao, Chen, et al.
Published: (2024)
by: Mao, Chen, et al.
Published: (2024)
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)
by: Guo, Minghao, et al.
Published: (2026)
Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
by: Li, Da, et al.
Published: (2025)
by: Li, Da, et al.
Published: (2025)
Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
by: Fu, Junchen, et al.
Published: (2024)
by: Fu, Junchen, et al.
Published: (2024)
Multimodal Language Models for Domain-Specific Procedural Video Summarization
by: Hussain, Nafisa
Published: (2024)
by: Hussain, Nafisa
Published: (2024)
LOVO: Efficient Complex Object Query in Large-Scale Video Datasets
by: Liu, Yuxin, et al.
Published: (2025)
by: Liu, Yuxin, et al.
Published: (2025)
FIGROTD: A Friendly-to-Handle Dataset for Image Guided Retrieval with Optional Text
by: Le, Hoang-Bao, et al.
Published: (2025)
by: Le, Hoang-Bao, et al.
Published: (2025)
UniNote: A Unified Embedding Model for Multimodal Representation and Ranking
by: Zhao, Jinghan, et al.
Published: (2026)
by: Zhao, Jinghan, et al.
Published: (2026)
BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment
by: Mounis, Mohamed Darwish, et al.
Published: (2026)
by: Mounis, Mohamed Darwish, et al.
Published: (2026)
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts
by: Cai, Qifeng, et al.
Published: (2025)
by: Cai, Qifeng, et al.
Published: (2025)
MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
by: Zhou, Junjie, et al.
Published: (2025)
by: Zhou, Junjie, et al.
Published: (2025)
Prototype-Driven Structure Synergy Network for Remote Sensing Images Segmentation
by: Wang, Junyi, et al.
Published: (2025)
by: Wang, Junyi, et al.
Published: (2025)
Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild
by: Wei, Tianqi, et al.
Published: (2024)
by: Wei, Tianqi, et al.
Published: (2024)
Similar Items
-
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025) -
Personalized Video Summarization using Text-Based Queries and Conditional Modeling
by: Huang, Jia-Hong
Published: (2024) -
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
by: Fu, Junchen, et al.
Published: (2024) -
FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
by: Feng, Kaidong, et al.
Published: (2026) -
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
by: Narayan, Kartik, et al.
Published: (2025)