:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Zhenyu, Li, Wenjia, Zhu, Pengyu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Information Retrieval
Online Access:	https://arxiv.org/abs/2502.14332
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
by: Tang, Hengzhu, et al.
Published: (2025)

Music Recommendation Based on Facial Emotion Recognition
by: B, Rajesh, et al.
Published: (2024)

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025)

I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking
by: Liu, Ziyan, et al.
Published: (2025)

Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images
by: Xiao, Bin, et al.
Published: (2023)

MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest
by: Yang, Xiao, et al.
Published: (2025)

DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization
by: Lv, Zheqi, et al.
Published: (2022)

UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries
by: Le, Hoang-Bao, et al.
Published: (2025)

YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions
by: Li, Xiguang, et al.
Published: (2024)

CoLLM: A Large Language Model for Composed Image Retrieval
by: Huynh, Chuong, et al.
Published: (2025)

FF-PNet: A Pyramid Network Based on Feature and Field for Brain Image Registration
by: Zhang, Ying, et al.
Published: (2025)

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)

Zero-Shot Hashing Based on Reconstruction With Part Alignment
by: Jiang, Yan, et al.
Published: (2025)

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
by: De Nadai, Marco, et al.
Published: (2025)

Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)

Leveraging Foundation Models for Content-Based Image Retrieval in Radiology
by: Denner, Stefan, et al.
Published: (2024)

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking
by: Zhao, Jinghan, et al.
Published: (2026)

Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework
by: Ortego, Diego, et al.
Published: (2025)

Personalized Video Summarization using Text-Based Queries and Conditional Modeling
by: Huang, Jia-Hong
Published: (2024)

Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models
by: Nakata, Kengo, et al.
Published: (2024)

Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal
by: Li, Xiangyu, et al.
Published: (2025)

TrajSV: A Trajectory-based Model for Sports Video Representations and Applications
by: Wang, Zheng, et al.
Published: (2025)

MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark
by: Osmulski, Radek, et al.
Published: (2025)

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis
by: Yang, Ruijie, et al.
Published: (2024)

A Flexible and Scalable Framework for Video Moment Search
by: Zhang, Chongzhi, et al.
Published: (2025)

Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
by: Molina, Adrià, et al.
Published: (2024)

GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025)

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
by: Lin, Lin, et al.
Published: (2025)

SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
by: Zhu, Minghao, et al.
Published: (2025)

E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)

Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning
by: Lu, Yingling, et al.
Published: (2024)

Optimizing Multi-Modal Models for Image-Based Shape Retrieval: The Role of Pre-Alignment and Hard Contrastive Learning
by: Kühn, Paul Julius, et al.
Published: (2026)

MealRec: Multi-granularity Sequential Modeling via Hierarchical Diffusion Models for Micro-Video Recommendation
by: Dong, Xinxin, et al.
Published: (2026)

NextAds: Towards Next-generation Personalized Video Advertising
by: Xu, Yiyan, et al.
Published: (2026)

Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval
by: Balloli, Vaibhav, et al.
Published: (2024)

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task
by: Deng, Jiaqi, et al.
Published: (2025)

Large Language Model Informed Patent Image Retrieval
by: Lo, Hao-Cheng, et al.
Published: (2024)

Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation
by: Shatri, Elona, et al.
Published: (2024)

Automating Iconclass: LLMs and RAG for Large-Scale Classification of Religious Woodcuts
by: Thomas, Drew B.
Published: (2025)

Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)