Saved in:
| Main Authors: | Zhang, Ying, Guo, Shuai, Sun, Chenxi, Zhu, Yuchen, Xiang, Jinhai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.04938 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tetrahedron-Net for Medical Image Registration
by: Xiang, Jinhai, et al.
Published: (2025)
by: Xiang, Jinhai, et al.
Published: (2025)
GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025)
by: Sun, Chengsong, et al.
Published: (2025)
PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval
by: Zeng, Ziyun, et al.
Published: (2021)
by: Zeng, Ziyun, et al.
Published: (2021)
Prototype-Driven Structure Synergy Network for Remote Sensing Images Segmentation
by: Wang, Junyi, et al.
Published: (2025)
by: Wang, Junyi, et al.
Published: (2025)
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)
by: Ma, Xiang, et al.
Published: (2024)
PCFEx: Point Cloud Feature Extraction for Graph Neural Networks
by: Masud, Abdullah Al, et al.
Published: (2026)
by: Masud, Abdullah Al, et al.
Published: (2026)
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval
by: Berriche, Aymene, et al.
Published: (2024)
by: Berriche, Aymene, et al.
Published: (2024)
A Flexible and Scalable Framework for Video Moment Search
by: Zhang, Chongzhi, et al.
Published: (2025)
by: Zhang, Chongzhi, et al.
Published: (2025)
FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation
by: Li, Zhenghua, et al.
Published: (2025)
by: Li, Zhenghua, et al.
Published: (2025)
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
by: Yang, Yuchen, et al.
Published: (2024)
by: Yang, Yuchen, et al.
Published: (2024)
Scalable Residual Feature Aggregation Framework with Hybrid Metaheuristic Optimization for Robust Early Pancreatic Neoplasm Detection in Multimodal CT Imaging
by: Thiruvengadam, Janani Annur, et al.
Published: (2025)
by: Thiruvengadam, Janani Annur, et al.
Published: (2025)
Offline Evaluation of Set-Based Text-to-Image Generation
by: Arabzadeh, Negar, et al.
Published: (2024)
by: Arabzadeh, Negar, et al.
Published: (2024)
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
by: Deng, Chenlong, et al.
Published: (2026)
by: Deng, Chenlong, et al.
Published: (2026)
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition
by: Xu, Yiyan, et al.
Published: (2025)
by: Xu, Yiyan, et al.
Published: (2025)
EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis
by: Yang, Ruijie, et al.
Published: (2024)
by: Yang, Ruijie, et al.
Published: (2024)
Leveraging Foundation Models for Content-Based Image Retrieval in Radiology
by: Denner, Stefan, et al.
Published: (2024)
by: Denner, Stefan, et al.
Published: (2024)
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
by: Wang, Yuhao, et al.
Published: (2024)
by: Wang, Yuhao, et al.
Published: (2024)
RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation
by: Ling, Run, et al.
Published: (2025)
by: Ling, Run, et al.
Published: (2025)
Interactive Mars Image Content-Based Search with Interpretable Machine Learning
by: Vasu, Bhavan, et al.
Published: (2024)
by: Vasu, Bhavan, et al.
Published: (2024)
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
by: Sun, Zelong, et al.
Published: (2025)
by: Sun, Zelong, et al.
Published: (2025)
ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval
by: Zhang, Zhuocheng, et al.
Published: (2026)
by: Zhang, Zhuocheng, et al.
Published: (2026)
Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images
by: Xiao, Bin, et al.
Published: (2023)
by: Xiao, Bin, et al.
Published: (2023)
A Novel Evaluation Framework for Image2Text Generation
by: Huang, Jia-Hong, et al.
Published: (2024)
by: Huang, Jia-Hong, et al.
Published: (2024)
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
by: Sun, Zengbao, et al.
Published: (2024)
by: Sun, Zengbao, et al.
Published: (2024)
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
by: Guo, Hao, et al.
Published: (2025)
by: Guo, Hao, et al.
Published: (2025)
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
by: Tu, Rong-Cheng, et al.
Published: (2025)
by: Tu, Rong-Cheng, et al.
Published: (2025)
A Collaborative Jade Recognition System for Mobile Devices Based on Lightweight and Large Models
by: Wang, Zhenyu, et al.
Published: (2025)
by: Wang, Zhenyu, et al.
Published: (2025)
FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
by: Feng, Kaidong, et al.
Published: (2026)
by: Feng, Kaidong, et al.
Published: (2026)
Semi-Supervised Image-Based Narrative Extraction: A Case Study with Historical Photographic Records
by: German, Fausto, et al.
Published: (2025)
by: German, Fausto, et al.
Published: (2025)
Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs
by: Zhang, Huaying, et al.
Published: (2024)
by: Zhang, Huaying, et al.
Published: (2024)
ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence
by: Shi, Zhuofan, et al.
Published: (2026)
by: Shi, Zhuofan, et al.
Published: (2026)
Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning
by: Lu, Yingling, et al.
Published: (2024)
by: Lu, Yingling, et al.
Published: (2024)
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)
by: Zhang, Weihang, et al.
Published: (2025)
Optimizing Multi-Modal Models for Image-Based Shape Retrieval: The Role of Pre-Alignment and Hard Contrastive Learning
by: Kühn, Paul Julius, et al.
Published: (2026)
by: Kühn, Paul Julius, et al.
Published: (2026)
Chain-of-Thought Re-ranking for Image Retrieval Tasks
by: Wu, Shangrong, et al.
Published: (2025)
by: Wu, Shangrong, et al.
Published: (2025)
DEMO: A Statistical Perspective for Efficient Image-Text Matching
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)
by: Guo, Zhuoning, et al.
Published: (2025)
RDP: Ranked Differential Privacy for Facial Feature Protection in Multiscale Sparsified Subspace
by: Ou, Lu, et al.
Published: (2024)
by: Ou, Lu, et al.
Published: (2024)
Entity Image and Mixed-Modal Image Retrieval Datasets
by: Blaga, Cristian-Ioan, et al.
Published: (2025)
by: Blaga, Cristian-Ioan, et al.
Published: (2025)
Similar Items
-
Tetrahedron-Net for Medical Image Registration
by: Xiang, Jinhai, et al.
Published: (2025) -
GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025) -
PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval
by: Zeng, Ziyun, et al.
Published: (2021) -
Prototype-Driven Structure Synergy Network for Remote Sensing Images Segmentation
by: Wang, Junyi, et al.
Published: (2025) -
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching
by: Ma, Xiang, et al.
Published: (2024)