:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Junyu, Gao, Yihua, Ge, Mingyuan, Li, Mingyong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Information Retrieval Multimedia
Online Access:	https://arxiv.org/abs/2507.09256
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Semantic Description Generation with MLLMs for Image-Text Matching
by: Chen, Junyu, et al.
Published: (2025)

GenState-AI: State-Aware Dataset for Text-to-Video Retrieval on AI-Generated Videos
by: Li, Minghan, et al.
Published: (2026)

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)

Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval
by: Messina, Nicola, et al.
Published: (2024)

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2024)

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
by: Xiao, Jian, et al.
Published: (2025)

VKIE: The Application of Key Information Extraction on Video Text
by: An, Siyu, et al.
Published: (2023)

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
by: Li, Jun, et al.
Published: (2025)

Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
by: Yang, Yuchen, et al.
Published: (2024)

Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline
by: Messina, Nicola, et al.
Published: (2024)

Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search
by: Hu, Fan, et al.
Published: (2025)

Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning
by: Luo, Tianci, et al.
Published: (2026)

Interactive Multi-Turn Retrieval for Health Videos
by: Wu, Chengzheng, et al.
Published: (2026)

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues
by: Fu, Junchen, et al.
Published: (2026)

VisTopics: A Visual Semantic Unsupervised Approach to Topic Modeling of Video and Image Data
by: Lokmanoglu, Ayse D, et al.
Published: (2025)

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
by: Zhang, Pingping, et al.
Published: (2024)

Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation
by: Yang, Jheng-Hong, et al.
Published: (2024)

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)

AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
by: Lian, Niu, et al.
Published: (2025)

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
by: Liu, Han, et al.
Published: (2025)

Efficient Self-Supervised Video Hashing with Selective State Spaces
by: Wang, Jinpeng, et al.
Published: (2024)

PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval
by: Xu, Tianyi, et al.
Published: (2026)

CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection
by: Li, Fanxiao, et al.
Published: (2025)

Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
by: Wu, Yiming, et al.
Published: (2024)

Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
by: Li, Po-han, et al.
Published: (2024)

CLOSP: A Unified Semantic Space for SAR, MSI, and Text in Remote Sensing
by: Cambrin, Daniele Rege, et al.
Published: (2025)

Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task
by: Deng, Jiaqi, et al.
Published: (2025)

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval
by: Kong, Fanheng, et al.
Published: (2025)

From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
by: Wang, Zheng, et al.
Published: (2025)

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)

A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels
by: Han, Haochen, et al.
Published: (2024)

Revisiting Uncertainty: On Evidential Learning for Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)

Understanding the Performance Plateau in Text-to-Video Retrieval: A Comprehensive Empirical and Linguistic Analysis
by: Pegia, Maria-Eirini, et al.
Published: (2026)

An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
by: Jing, Xiaolun, et al.
Published: (2024)

PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
by: Zou, Qiang, et al.
Published: (2025)

Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval
by: Lin, Junan, et al.
Published: (2025)

A Comprehensive Survey on Composed Image Retrieval
by: Song, Xuemeng, et al.
Published: (2025)

Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models
by: Williams-Lekuona, Mikel, et al.
Published: (2025)