Saved in:
| Main Authors: | Zhang, Qi, Chen, Yuxu, Deng, Lei, Shen, Lili |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.17178 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching
by: Ge, Xuri, et al.
Published: (2024)
by: Ge, Xuri, et al.
Published: (2024)
DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)
by: Zhang, Weihang, et al.
Published: (2025)
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)
by: Ma, Xiang, et al.
Published: (2024)
Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Matching
by: Chen, Junyu, et al.
Published: (2025)
by: Chen, Junyu, et al.
Published: (2025)
MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval
by: Sogi, Naoya, et al.
Published: (2025)
by: Sogi, Naoya, et al.
Published: (2025)
Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval
by: Long, Zijun, et al.
Published: (2025)
by: Long, Zijun, et al.
Published: (2025)
ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval
by: Zhang, Zhuocheng, et al.
Published: (2026)
by: Zhang, Zhuocheng, et al.
Published: (2026)
A Novel Evaluation Framework for Image2Text Generation
by: Huang, Jia-Hong, et al.
Published: (2024)
by: Huang, Jia-Hong, et al.
Published: (2024)
Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images
by: Xu, Shicheng, et al.
Published: (2023)
by: Xu, Shicheng, et al.
Published: (2023)
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
by: Sun, Zelong, et al.
Published: (2025)
by: Sun, Zelong, et al.
Published: (2025)
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement
by: Hou, Danyang, et al.
Published: (2024)
by: Hou, Danyang, et al.
Published: (2024)
Visual Zero-Shot E-Commerce Product Attribute Value Extraction
by: Gong, Jiaying, et al.
Published: (2025)
by: Gong, Jiaying, et al.
Published: (2025)
Towards Text-Image Interleaved Retrieval
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
by: Mao, Chen, et al.
Published: (2024)
by: Mao, Chen, et al.
Published: (2024)
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025)
by: Meng, GuangHao, et al.
Published: (2025)
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition
by: Xu, Yiyan, et al.
Published: (2025)
by: Xu, Yiyan, et al.
Published: (2025)
Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval
by: Lu, Xin, et al.
Published: (2023)
by: Lu, Xin, et al.
Published: (2023)
Offline Evaluation of Set-Based Text-to-Image Generation
by: Arabzadeh, Negar, et al.
Published: (2024)
by: Arabzadeh, Negar, et al.
Published: (2024)
Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
by: Xu, Yexing, et al.
Published: (2026)
by: Xu, Yexing, et al.
Published: (2026)
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)
by: Ning, Hailong, et al.
Published: (2025)
MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
by: Zhou, Junjie, et al.
Published: (2025)
by: Zhou, Junjie, et al.
Published: (2025)
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction
by: Yang, Jiashu, et al.
Published: (2026)
by: Yang, Jiashu, et al.
Published: (2026)
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
by: Deng, Chenlong, et al.
Published: (2026)
by: Deng, Chenlong, et al.
Published: (2026)
TIGER-FG: Text-Guided Implicit Fine-Grained Grounding for E-commerce Retrieval
by: Sun, Xinyu, et al.
Published: (2026)
by: Sun, Xinyu, et al.
Published: (2026)
FIGROTD: A Friendly-to-Handle Dataset for Image Guided Retrieval with Optional Text
by: Le, Hoang-Bao, et al.
Published: (2025)
by: Le, Hoang-Bao, et al.
Published: (2025)
Language-only Efficient Training of Zero-shot Composed Image Retrieval
by: Gu, Geonmo, et al.
Published: (2023)
by: Gu, Geonmo, et al.
Published: (2023)
Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
by: Giahi, Ramin, et al.
Published: (2025)
by: Giahi, Ramin, et al.
Published: (2025)
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
by: Byun, Jaeseok, et al.
Published: (2024)
by: Byun, Jaeseok, et al.
Published: (2024)
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
by: Yang, Yuxin, et al.
Published: (2025)
by: Yang, Yuxin, et al.
Published: (2025)
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
by: Jing, Xiaolun, et al.
Published: (2024)
by: Jing, Xiaolun, et al.
Published: (2024)
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
by: Sun, Zengbao, et al.
Published: (2024)
by: Sun, Zengbao, et al.
Published: (2024)
A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback
by: Khaertdinov, Bulat, et al.
Published: (2025)
by: Khaertdinov, Bulat, et al.
Published: (2025)
Automatic Creative Selection with Cross-Modal Matching
by: Kim, Alex, et al.
Published: (2024)
by: Kim, Alex, et al.
Published: (2024)
Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal
by: Li, Xiangyu, et al.
Published: (2025)
by: Li, Xiangyu, et al.
Published: (2025)
Image Outlier Detection Without Training using RANSAC
by: Tsai, Chen-Han, et al.
Published: (2023)
by: Tsai, Chen-Han, et al.
Published: (2023)
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
by: Yu, Xuzheng, et al.
Published: (2024)
by: Yu, Xuzheng, et al.
Published: (2024)
Similar Items
-
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching
by: Ge, Xuri, et al.
Published: (2024) -
DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024) -
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025) -
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024) -
Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Matching
by: Chen, Junyu, et al.
Published: (2025)