:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Ying, Guo, Shuai, Sun, Chenxi, Zhu, Yuchen, Xiang, Jinhai
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Information Retrieval
Online Access:	https://arxiv.org/abs/2505.04938
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tetrahedron-Net for Medical Image Registration
by: Xiang, Jinhai, et al.
Published: (2025)

GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025)

PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval
by: Zeng, Ziyun, et al.
Published: (2021)

Prototype-Driven Structure Synergy Network for Remote Sensing Images Segmentation
by: Wang, Junyi, et al.
Published: (2025)

Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)

PCFEx: Point Cloud Feature Extraction for Graph Neural Networks
by: Masud, Abdullah Al, et al.
Published: (2026)

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)

Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval
by: Berriche, Aymene, et al.
Published: (2024)

A Flexible and Scalable Framework for Video Moment Search
by: Zhang, Chongzhi, et al.
Published: (2025)

FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation
by: Li, Zhenghua, et al.
Published: (2025)

Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
by: Yang, Yuchen, et al.
Published: (2024)

Scalable Residual Feature Aggregation Framework with Hybrid Metaheuristic Optimization for Robust Early Pancreatic Neoplasm Detection in Multimodal CT Imaging
by: Thiruvengadam, Janani Annur, et al.
Published: (2025)

Offline Evaluation of Set-Based Text-to-Image Generation
by: Arabzadeh, Negar, et al.
Published: (2024)

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
by: Deng, Chenlong, et al.
Published: (2026)

DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition
by: Xu, Yiyan, et al.
Published: (2025)

EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis
by: Yang, Ruijie, et al.
Published: (2024)

Leveraging Foundation Models for Content-Based Image Retrieval in Radiology
by: Denner, Stefan, et al.
Published: (2024)

Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
by: Wang, Yuhao, et al.
Published: (2024)

RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation
by: Ling, Run, et al.
Published: (2025)

Interactive Mars Image Content-Based Search with Interpretable Machine Learning
by: Vasu, Bhavan, et al.
Published: (2024)

CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
by: Sun, Zelong, et al.
Published: (2025)

ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval
by: Zhang, Zhuocheng, et al.
Published: (2026)

Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images
by: Xiao, Bin, et al.
Published: (2023)

A Novel Evaluation Framework for Image2Text Generation
by: Huang, Jia-Hong, et al.
Published: (2024)

Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
by: Sun, Zengbao, et al.
Published: (2024)

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
by: Guo, Hao, et al.
Published: (2025)

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)

A Collaborative Jade Recognition System for Mobile Devices Based on Lightweight and Large Models
by: Wang, Zhenyu, et al.
Published: (2025)

FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
by: Feng, Kaidong, et al.
Published: (2026)

Semi-Supervised Image-Based Narrative Extraction: A Case Study with Historical Photographic Records
by: German, Fausto, et al.
Published: (2025)

Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs
by: Zhang, Huaying, et al.
Published: (2024)

ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence
by: Shi, Zhuofan, et al.
Published: (2026)

Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning
by: Lu, Yingling, et al.
Published: (2024)

A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)

Optimizing Multi-Modal Models for Image-Based Shape Retrieval: The Role of Pre-Alignment and Hard Contrastive Learning
by: Kühn, Paul Julius, et al.
Published: (2026)

Chain-of-Thought Re-ranking for Image Retrieval Tasks
by: Wu, Shangrong, et al.
Published: (2025)

DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024)

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)

RDP: Ranked Differential Privacy for Facial Feature Protection in Multiscale Sparsified Subspace
by: Ou, Lu, et al.
Published: (2024)

Entity Image and Mixed-Modal Image Retrieval Datasets
by: Blaga, Cristian-Ioan, et al.
Published: (2025)