:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Kai, Chen, Xingyu, Zhang, Xiaofeng
Format:	Preprint
Published:	2025
Subjects:	Graphics Computer Vision and Pattern Recognition Information Retrieval Information Theory
Online Access:	https://arxiv.org/abs/2505.12782
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CTR-Driven Advertising Image Generation with Multimodal Large Language Models
by: Chen, Xingye, et al.
Published: (2025)

FreeEnricher: Enriching Face Landmarks without Additional Cost
by: Huang, Yangyu, et al.
Published: (2022)

Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
by: Zhou, Han, et al.
Published: (2023)

ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
by: Huang, Yangyu, et al.
Published: (2021)

Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)

FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation
by: Li, Zhenghua, et al.
Published: (2025)

Re-ranking the Context for Multimodal Retrieval Augmented Generation
by: Mortaheb, Matin, et al.
Published: (2025)

RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance
by: Mortaheb, Matin, et al.
Published: (2025)

Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
by: Fu, Junchen, et al.
Published: (2024)

MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
by: Zhou, Junjie, et al.
Published: (2025)

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)

Self-supervised Learning of Rotation-invariant 3D Point Set Features using Transformer and its Self-distillation
by: Furuya, Takahiko, et al.
Published: (2023)

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
by: Zhang, Pingping, et al.
Published: (2024)

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
by: De Nadai, Marco, et al.
Published: (2025)

E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)

Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
by: Li, Da, et al.
Published: (2025)

RAGE for the Machine: Image Compression with Low-Cost Random Access for Embedded Applications
by: Rask, Christian D., et al.
Published: (2024)

CLAS: A Machine Learning Enhanced Framework for Exploring Large 3D Design Datasets
by: Zhang, XiuYu, et al.
Published: (2024)

Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction
by: Zhang, Yao, et al.
Published: (2026)

GenMRP: A Generative Multi-Route Planning Framework for Efficient and Personalized Real-Time Industrial Navigation
by: Wang, Chengzhang, et al.
Published: (2026)

A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
by: Fu, Junchen, et al.
Published: (2024)

LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2025)

DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024)

Efficient Logic Gate Networks for Video Copy Detection
by: Fojcik, Katarzyna
Published: (2026)

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)

DSparsE: Dynamic Sparse Embedding for Knowledge Graph Completion
by: Yang, Chuhong, et al.
Published: (2024)

Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
by: Wang, Hongyi, et al.
Published: (2025)

Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues
by: Fu, Junchen, et al.
Published: (2026)

LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS
by: Liu, Xinyu, et al.
Published: (2024)

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025)

AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
by: Yang, Enneng, et al.
Published: (2022)

Multimodal Language Models for Domain-Specific Procedural Video Summarization
by: Hussain, Nafisa
Published: (2024)

HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)

PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
by: Zhang, Wenyi, et al.
Published: (2025)

LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts
by: Cai, Qifeng, et al.
Published: (2025)

Visualization of Knowledge Graphs with Embeddings: an Essay on Recent Trends and Methods
by: Riva, Davide, et al.
Published: (2024)

UniNote: A Unified Embedding Model for Multimodal Representation and Ranking
by: Zhao, Jinghan, et al.
Published: (2026)

U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
by: Li, Xiaojie, et al.
Published: (2025)