Saved in:
| Main Authors: | Wang, Zhenyu, Li, Wenjia, Zhu, Pengyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.14332 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
by: Tang, Hengzhu, et al.
Published: (2025)
by: Tang, Hengzhu, et al.
Published: (2025)
Music Recommendation Based on Facial Emotion Recognition
by: B, Rajesh, et al.
Published: (2024)
by: B, Rajesh, et al.
Published: (2024)
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025)
by: Meng, GuangHao, et al.
Published: (2025)
I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking
by: Liu, Ziyan, et al.
Published: (2025)
by: Liu, Ziyan, et al.
Published: (2025)
Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images
by: Xiao, Bin, et al.
Published: (2023)
by: Xiao, Bin, et al.
Published: (2023)
MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest
by: Yang, Xiao, et al.
Published: (2025)
by: Yang, Xiao, et al.
Published: (2025)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization
by: Lv, Zheqi, et al.
Published: (2022)
by: Lv, Zheqi, et al.
Published: (2022)
UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries
by: Le, Hoang-Bao, et al.
Published: (2025)
by: Le, Hoang-Bao, et al.
Published: (2025)
YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions
by: Li, Xiguang, et al.
Published: (2024)
by: Li, Xiguang, et al.
Published: (2024)
CoLLM: A Large Language Model for Composed Image Retrieval
by: Huynh, Chuong, et al.
Published: (2025)
by: Huynh, Chuong, et al.
Published: (2025)
FF-PNet: A Pyramid Network Based on Feature and Field for Brain Image Registration
by: Zhang, Ying, et al.
Published: (2025)
by: Zhang, Ying, et al.
Published: (2025)
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Zero-Shot Hashing Based on Reconstruction With Part Alignment
by: Jiang, Yan, et al.
Published: (2025)
by: Jiang, Yan, et al.
Published: (2025)
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
by: De Nadai, Marco, et al.
Published: (2025)
by: De Nadai, Marco, et al.
Published: (2025)
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
Leveraging Foundation Models for Content-Based Image Retrieval in Radiology
by: Denner, Stefan, et al.
Published: (2024)
by: Denner, Stefan, et al.
Published: (2024)
UniNote: A Unified Embedding Model for Multimodal Representation and Ranking
by: Zhao, Jinghan, et al.
Published: (2026)
by: Zhao, Jinghan, et al.
Published: (2026)
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework
by: Ortego, Diego, et al.
Published: (2025)
by: Ortego, Diego, et al.
Published: (2025)
Personalized Video Summarization using Text-Based Queries and Conditional Modeling
by: Huang, Jia-Hong
Published: (2024)
by: Huang, Jia-Hong
Published: (2024)
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models
by: Nakata, Kengo, et al.
Published: (2024)
by: Nakata, Kengo, et al.
Published: (2024)
Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal
by: Li, Xiangyu, et al.
Published: (2025)
by: Li, Xiangyu, et al.
Published: (2025)
TrajSV: A Trajectory-based Model for Sports Video Representations and Applications
by: Wang, Zheng, et al.
Published: (2025)
by: Wang, Zheng, et al.
Published: (2025)
MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark
by: Osmulski, Radek, et al.
Published: (2025)
by: Osmulski, Radek, et al.
Published: (2025)
EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis
by: Yang, Ruijie, et al.
Published: (2024)
by: Yang, Ruijie, et al.
Published: (2024)
A Flexible and Scalable Framework for Video Moment Search
by: Zhang, Chongzhi, et al.
Published: (2025)
by: Zhang, Chongzhi, et al.
Published: (2025)
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
by: Molina, Adrià, et al.
Published: (2024)
by: Molina, Adrià, et al.
Published: (2024)
GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025)
by: Sun, Chengsong, et al.
Published: (2025)
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
by: Lin, Lin, et al.
Published: (2025)
by: Lin, Lin, et al.
Published: (2025)
SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
by: Zhu, Minghao, et al.
Published: (2025)
by: Zhu, Minghao, et al.
Published: (2025)
E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)
by: Jiang, Ting, et al.
Published: (2024)
Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning
by: Lu, Yingling, et al.
Published: (2024)
by: Lu, Yingling, et al.
Published: (2024)
Optimizing Multi-Modal Models for Image-Based Shape Retrieval: The Role of Pre-Alignment and Hard Contrastive Learning
by: Kühn, Paul Julius, et al.
Published: (2026)
by: Kühn, Paul Julius, et al.
Published: (2026)
MealRec: Multi-granularity Sequential Modeling via Hierarchical Diffusion Models for Micro-Video Recommendation
by: Dong, Xinxin, et al.
Published: (2026)
by: Dong, Xinxin, et al.
Published: (2026)
NextAds: Towards Next-generation Personalized Video Advertising
by: Xu, Yiyan, et al.
Published: (2026)
by: Xu, Yiyan, et al.
Published: (2026)
Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval
by: Balloli, Vaibhav, et al.
Published: (2024)
by: Balloli, Vaibhav, et al.
Published: (2024)
A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task
by: Deng, Jiaqi, et al.
Published: (2025)
by: Deng, Jiaqi, et al.
Published: (2025)
Large Language Model Informed Patent Image Retrieval
by: Lo, Hao-Cheng, et al.
Published: (2024)
by: Lo, Hao-Cheng, et al.
Published: (2024)
Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation
by: Shatri, Elona, et al.
Published: (2024)
by: Shatri, Elona, et al.
Published: (2024)
Automating Iconclass: LLMs and RAG for Large-Scale Classification of Religious Woodcuts
by: Thomas, Drew B.
Published: (2025)
by: Thomas, Drew B.
Published: (2025)
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Similar Items
-
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
by: Tang, Hengzhu, et al.
Published: (2025) -
Music Recommendation Based on Facial Emotion Recognition
by: B, Rajesh, et al.
Published: (2024) -
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025) -
I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking
by: Liu, Ziyan, et al.
Published: (2025) -
Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images
by: Xiao, Bin, et al.
Published: (2023)