Saved in:
| Main Authors: | Zhang, Kai, Chen, Xingyu, Zhang, Xiaofeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.12782 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
by: Chen, Xingye, et al.
Published: (2025)
by: Chen, Xingye, et al.
Published: (2025)
FreeEnricher: Enriching Face Landmarks without Additional Cost
by: Huang, Yangyu, et al.
Published: (2022)
by: Huang, Yangyu, et al.
Published: (2022)
Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
by: Zhou, Han, et al.
Published: (2023)
by: Zhou, Han, et al.
Published: (2023)
ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
by: Huang, Yangyu, et al.
Published: (2021)
by: Huang, Yangyu, et al.
Published: (2021)
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)
by: Zhang, Jinxu
Published: (2024)
FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation
by: Li, Zhenghua, et al.
Published: (2025)
by: Li, Zhenghua, et al.
Published: (2025)
Re-ranking the Context for Multimodal Retrieval Augmented Generation
by: Mortaheb, Matin, et al.
Published: (2025)
by: Mortaheb, Matin, et al.
Published: (2025)
RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance
by: Mortaheb, Matin, et al.
Published: (2025)
by: Mortaheb, Matin, et al.
Published: (2025)
Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation
by: Fu, Junchen, et al.
Published: (2024)
by: Fu, Junchen, et al.
Published: (2024)
MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
by: Zhou, Junjie, et al.
Published: (2025)
by: Zhou, Junjie, et al.
Published: (2025)
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)
by: Tu, Rong-Cheng, et al.
Published: (2025)
Self-supervised Learning of Rotation-invariant 3D Point Set Features using Transformer and its Self-distillation
by: Furuya, Takahiko, et al.
Published: (2023)
by: Furuya, Takahiko, et al.
Published: (2023)
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
by: Zhang, Pingping, et al.
Published: (2024)
by: Zhang, Pingping, et al.
Published: (2024)
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
by: De Nadai, Marco, et al.
Published: (2025)
by: De Nadai, Marco, et al.
Published: (2025)
E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)
by: Jiang, Ting, et al.
Published: (2024)
Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding
by: Li, Da, et al.
Published: (2025)
by: Li, Da, et al.
Published: (2025)
RAGE for the Machine: Image Compression with Low-Cost Random Access for Embedded Applications
by: Rask, Christian D., et al.
Published: (2024)
by: Rask, Christian D., et al.
Published: (2024)
CLAS: A Machine Learning Enhanced Framework for Exploring Large 3D Design Datasets
by: Zhang, XiuYu, et al.
Published: (2024)
by: Zhang, XiuYu, et al.
Published: (2024)
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction
by: Zhang, Yao, et al.
Published: (2026)
by: Zhang, Yao, et al.
Published: (2026)
GenMRP: A Generative Multi-Route Planning Framework for Efficient and Personalized Real-Time Industrial Navigation
by: Wang, Chengzhang, et al.
Published: (2026)
by: Wang, Chengzhang, et al.
Published: (2026)
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)
by: Zhang, Weihang, et al.
Published: (2025)
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
by: Fu, Junchen, et al.
Published: (2024)
by: Fu, Junchen, et al.
Published: (2024)
LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2025)
by: Wu, Wangyu, et al.
Published: (2025)
DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
Efficient Logic Gate Networks for Video Copy Detection
by: Fojcik, Katarzyna
Published: (2026)
by: Fojcik, Katarzyna
Published: (2026)
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
by: Zhang, Longxiang, et al.
Published: (2026)
by: Zhang, Longxiang, et al.
Published: (2026)
DSparsE: Dynamic Sparse Embedding for Knowledge Graph Completion
by: Yang, Chuhong, et al.
Published: (2024)
by: Yang, Chuhong, et al.
Published: (2024)
Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
by: Wang, Hongyi, et al.
Published: (2025)
by: Wang, Hongyi, et al.
Published: (2025)
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)
by: Ma, Xiang, et al.
Published: (2024)
Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues
by: Fu, Junchen, et al.
Published: (2026)
by: Fu, Junchen, et al.
Published: (2026)
LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS
by: Liu, Xinyu, et al.
Published: (2024)
by: Liu, Xinyu, et al.
Published: (2024)
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025)
by: Meng, GuangHao, et al.
Published: (2025)
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
by: Yang, Enneng, et al.
Published: (2022)
by: Yang, Enneng, et al.
Published: (2022)
Multimodal Language Models for Domain-Specific Procedural Video Summarization
by: Hussain, Nafisa
Published: (2024)
by: Hussain, Nafisa
Published: (2024)
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)
by: Askari, Arian, et al.
Published: (2025)
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
by: Zhang, Wenyi, et al.
Published: (2025)
by: Zhang, Wenyi, et al.
Published: (2025)
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts
by: Cai, Qifeng, et al.
Published: (2025)
by: Cai, Qifeng, et al.
Published: (2025)
Visualization of Knowledge Graphs with Embeddings: an Essay on Recent Trends and Methods
by: Riva, Davide, et al.
Published: (2024)
by: Riva, Davide, et al.
Published: (2024)
UniNote: A Unified Embedding Model for Multimodal Representation and Ranking
by: Zhao, Jinghan, et al.
Published: (2026)
by: Zhao, Jinghan, et al.
Published: (2026)
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
by: Li, Xiaojie, et al.
Published: (2025)
by: Li, Xiaojie, et al.
Published: (2025)
Similar Items
-
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
by: Chen, Xingye, et al.
Published: (2025) -
FreeEnricher: Enriching Face Landmarks without Additional Cost
by: Huang, Yangyu, et al.
Published: (2022) -
Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
by: Zhou, Han, et al.
Published: (2023) -
ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
by: Huang, Yangyu, et al.
Published: (2021) -
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)