:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Qi, Chen, Yuxu, Deng, Lei, Shen, Lili
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Information Retrieval
Online Access:	https://arxiv.org/abs/2512.17178
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching
by: Ge, Xuri, et al.
Published: (2024)

DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024)

A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)

Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)

Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Matching
by: Chen, Junyu, et al.
Published: (2025)

MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval
by: Sogi, Naoya, et al.
Published: (2025)

Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval
by: Long, Zijun, et al.
Published: (2025)

ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval
by: Zhang, Zhuocheng, et al.
Published: (2026)

A Novel Evaluation Framework for Image2Text Generation
by: Huang, Jia-Hong, et al.
Published: (2024)

Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images
by: Xu, Shicheng, et al.
Published: (2023)

Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)

CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
by: Sun, Zelong, et al.
Published: (2025)

Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement
by: Hou, Danyang, et al.
Published: (2024)

Visual Zero-Shot E-Commerce Product Attribute Value Extraction
by: Gong, Jiaying, et al.
Published: (2025)

Towards Text-Image Interleaved Retrieval
by: Zhang, Xin, et al.
Published: (2025)

ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
by: Mao, Chen, et al.
Published: (2024)

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
by: Meng, GuangHao, et al.
Published: (2025)

DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition
by: Xu, Yiyan, et al.
Published: (2025)

Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval
by: Lu, Xin, et al.
Published: (2023)

Offline Evaluation of Set-Based Text-to-Image Generation
by: Arabzadeh, Negar, et al.
Published: (2024)

Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
by: Xu, Yexing, et al.
Published: (2026)

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)

MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
by: Zhou, Junjie, et al.
Published: (2025)

Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
by: Wang, Yifan, et al.
Published: (2025)

AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction
by: Yang, Jiashu, et al.
Published: (2026)

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
by: Deng, Chenlong, et al.
Published: (2026)

TIGER-FG: Text-Guided Implicit Fine-Grained Grounding for E-commerce Retrieval
by: Sun, Xinyu, et al.
Published: (2026)

FIGROTD: A Friendly-to-Handle Dataset for Image Guided Retrieval with Optional Text
by: Le, Hoang-Bao, et al.
Published: (2025)

Language-only Efficient Training of Zero-shot Composed Image Retrieval
by: Gu, Geonmo, et al.
Published: (2023)

Structural Anchor Pruning: Training-Free Multi-Vector Compression for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)

VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
by: Giahi, Ramin, et al.
Published: (2025)

An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
by: Byun, Jaeseok, et al.
Published: (2024)

DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
by: Yang, Yuxin, et al.
Published: (2025)

An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
by: Jing, Xiaolun, et al.
Published: (2024)

Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
by: Sun, Zengbao, et al.
Published: (2024)

A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback
by: Khaertdinov, Bulat, et al.
Published: (2025)

Automatic Creative Selection with Cross-Modal Matching
by: Kim, Alex, et al.
Published: (2024)

Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal
by: Li, Xiangyu, et al.
Published: (2025)

Image Outlier Detection Without Training using RANSAC
by: Tsai, Chen-Han, et al.
Published: (2023)

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
by: Yu, Xuzheng, et al.
Published: (2024)