Saved in:
| Main Authors: | Chen, Junyu, Gao, Yihua, Ge, Mingyuan, Li, Mingyong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.09256 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Semantic Description Generation with MLLMs for Image-Text Matching
by: Chen, Junyu, et al.
Published: (2025)
by: Chen, Junyu, et al.
Published: (2025)
GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)
by: Li, Minghan, et al.
Published: (2026)
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)
by: Ning, Hailong, et al.
Published: (2025)
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024)
by: Messina, Nicola, et al.
Published: (2024)
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)
by: Xiao, Jian, et al.
Published: (2024)
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2025)
by: Xiao, Jian, et al.
Published: (2025)
VKIE: The Application of Key Information Extraction on Video Text
by: An, Siyu, et al.
Published: (2023)
by: An, Siyu, et al.
Published: (2023)
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
by: Li, Jun, et al.
Published: (2025)
by: Li, Jun, et al.
Published: (2025)
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
by: Yang, Yuchen, et al.
Published: (2024)
by: Yang, Yuchen, et al.
Published: (2024)
Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)
by: Messina, Nicola, et al.
Published: (2024)
Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
by: Hu, Fan, et al.
Published: (2025)
by: Hu, Fan, et al.
Published: (2025)
Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning
by: Luo, Tianci, et al.
Published: (2026)
by: Luo, Tianci, et al.
Published: (2026)
Interactive Multi-Turn Retrieval for Health Videos
by: Wu, Chengzheng, et al.
Published: (2026)
by: Wu, Chengzheng, et al.
Published: (2026)
Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues
by: Fu, Junchen, et al.
Published: (2026)
by: Fu, Junchen, et al.
Published: (2026)
VisTopics: A Visual Semantic Unsupervised Approach to Topic Modeling of Video and Image Data
by: Lokmanoglu, Ayse D, et al.
Published: (2025)
by: Lokmanoglu, Ayse D, et al.
Published: (2025)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
by: Zhang, Pingping, et al.
Published: (2024)
by: Zhang, Pingping, et al.
Published: (2024)
Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation
by: Yang, Jheng-Hong, et al.
Published: (2024)
by: Yang, Jheng-Hong, et al.
Published: (2024)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)
by: Li, Jun, et al.
Published: (2026)
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
by: Lian, Niu, et al.
Published: (2025)
by: Lian, Niu, et al.
Published: (2025)
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
by: Liu, Han, et al.
Published: (2025)
by: Liu, Han, et al.
Published: (2025)
Efficient Self-Supervised Video Hashing with Selective State Spaces
by: Wang, Jinpeng, et al.
Published: (2024)
by: Wang, Jinpeng, et al.
Published: (2024)
PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval
by: Xu, Tianyi, et al.
Published: (2026)
by: Xu, Tianyi, et al.
Published: (2026)
CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection
by: Li, Fanxiao, et al.
Published: (2025)
by: Li, Fanxiao, et al.
Published: (2025)
Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)
by: Li, Po-han, et al.
Published: (2024)
CLOSP: A Unified Semantic Space for SAR, MSI, and Text in Remote Sensing
by: Cambrin, Daniele Rege, et al.
Published: (2025)
by: Cambrin, Daniele Rege, et al.
Published: (2025)
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025)
by: Kong, Fanheng, et al.
Published: (2025)
From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)
by: Wang, Zheng, et al.
Published: (2025)
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)
by: Jiang, Haoyu, et al.
Published: (2024)
A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
by: Han, Haochen, et al.
Published: (2024)
by: Han, Haochen, et al.
Published: (2024)
Revisiting Uncertainty: On Evidential Learning for Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)
by: Li, Jun, et al.
Published: (2026)
Understanding the Performance Plateau in Text-to-Video Retrieval: A Comprehensive Empirical and Linguistic Analysis
by: Pegia, Maria-Eirini, et al.
Published: (2026)
by: Pegia, Maria-Eirini, et al.
Published: (2026)
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
by: Jing, Xiaolun, et al.
Published: (2024)
by: Jing, Xiaolun, et al.
Published: (2024)
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
by: Zou, Qiang, et al.
Published: (2025)
by: Zou, Qiang, et al.
Published: (2025)
Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval
by: Lin, Junan, et al.
Published: (2025)
by: Lin, Junan, et al.
Published: (2025)
A Comprehensive Survey on Composed Image Retrieval
by: Song, Xuemeng, et al.
Published: (2025)
by: Song, Xuemeng, et al.
Published: (2025)
Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models
by: Williams-Lekuona, Mikel, et al.
Published: (2025)
by: Williams-Lekuona, Mikel, et al.
Published: (2025)
Similar Items
-
Visual Semantic Description Generation with MLLMs for Image-Text Matching
by: Chen, Junyu, et al.
Published: (2025) -
GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026) -
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025) -
Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024) -
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)