Saved in:
| Main Authors: | Hou, Bohan, Lin, Haoqiang, Wen, Haokun, Liu, Meng, Xu, Mingzhu, Song, Xuemeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.06001 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Comprehensive Survey on Composed Image Retrieval
by: Song, Xuemeng, et al.
Published: (2025)
by: Song, Xuemeng, et al.
Published: (2025)
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
by: Lin, Haoqiang, et al.
Published: (2025)
by: Lin, Haoqiang, et al.
Published: (2025)
UniCVR: From Alignment to Reranking for Unified Zero-Shot Composed Visual Retrieval
by: Wen, Haokun, et al.
Published: (2026)
by: Wen, Haokun, et al.
Published: (2026)
FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning
by: Wen, Haokun, et al.
Published: (2026)
by: Wen, Haokun, et al.
Published: (2026)
HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval
by: Chen, Zhiwei, et al.
Published: (2025)
by: Chen, Zhiwei, et al.
Published: (2025)
Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval
by: Yang, Yuxin, et al.
Published: (2026)
by: Yang, Yuxin, et al.
Published: (2026)
Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data
by: Duan, Yiqun, et al.
Published: (2025)
by: Duan, Yiqun, et al.
Published: (2025)
TALDS-Net: Task-Aware Adaptive Local Descriptors Selection for Few-shot Image Classification
by: Qiao, Qian, et al.
Published: (2023)
by: Qiao, Qian, et al.
Published: (2023)
Selective Vision-Language Subspace Projection for Few-shot CLIP
by: Zhu, Xingyu, et al.
Published: (2024)
by: Zhu, Xingyu, et al.
Published: (2024)
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
by: Lin, Haokun, et al.
Published: (2024)
by: Lin, Haokun, et al.
Published: (2024)
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
by: Xu, Yifang, et al.
Published: (2025)
by: Xu, Yifang, et al.
Published: (2025)
TraveLLaMA: A Multimodal Travel Assistant with Large-Scale Dataset and Structured Reasoning
by: Chu, Meng, et al.
Published: (2025)
by: Chu, Meng, et al.
Published: (2025)
PMPGuard: Catching Pseudo-Matched Pairs in Remote Sensing Image-Text Retrieval
by: Ouyang, Pengxiang, et al.
Published: (2025)
by: Ouyang, Pengxiang, et al.
Published: (2025)
Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval
by: Wen, Haokun, et al.
Published: (2023)
by: Wen, Haokun, et al.
Published: (2023)
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
by: Yang, Danni, et al.
Published: (2024)
by: Yang, Danni, et al.
Published: (2024)
Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing
by: Chen, Yaru, et al.
Published: (2025)
by: Chen, Yaru, et al.
Published: (2025)
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
by: Zhou, Jinxing, et al.
Published: (2024)
by: Zhou, Jinxing, et al.
Published: (2024)
A Simple Task-aware Contrastive Local Descriptor Selection Strategy for Few-shot Learning between inter class and intra class
by: Qiao, Qian, et al.
Published: (2024)
by: Qiao, Qian, et al.
Published: (2024)
Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval
by: Wen, Haokun, et al.
Published: (2024)
by: Wen, Haokun, et al.
Published: (2024)
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition
by: Lian, Zheng, et al.
Published: (2023)
by: Lian, Zheng, et al.
Published: (2023)
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition
by: Gan, Yaozong, et al.
Published: (2024)
by: Gan, Yaozong, et al.
Published: (2024)
PRVR: Partially Relevant Video Retrieval
by: Chen, Xianke, et al.
Published: (2022)
by: Chen, Xianke, et al.
Published: (2022)
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
by: Lin, Honglin, et al.
Published: (2024)
by: Lin, Honglin, et al.
Published: (2024)
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
GAIA: Zero-shot Talking Avatar Generation
by: He, Tianyu, et al.
Published: (2023)
by: He, Tianyu, et al.
Published: (2023)
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
by: Hu, Xiaowan, et al.
Published: (2024)
by: Hu, Xiaowan, et al.
Published: (2024)
Visual Autoregressive Modeling for Instruction-Guided Image Editing
by: Mao, Qingyang, et al.
Published: (2025)
by: Mao, Qingyang, et al.
Published: (2025)
Composing Concepts from Images and Videos via Concept-prompt Binding
by: Kong, Xianghao, et al.
Published: (2025)
by: Kong, Xianghao, et al.
Published: (2025)
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)
by: Song, Zijie, et al.
Published: (2023)
TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration
by: Shi, Xiaoyu, et al.
Published: (2025)
by: Shi, Xiaoyu, et al.
Published: (2025)
Memory-enhanced Retrieval Augmentation for Long Video Understanding
by: Yuan, Huaying, et al.
Published: (2025)
by: Yuan, Huaying, et al.
Published: (2025)
Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization
by: Xu, Yu, et al.
Published: (2024)
by: Xu, Yu, et al.
Published: (2024)
Fine-grained Image Retrieval via Dual-Vision Adaptation
by: Jiang, Xin, et al.
Published: (2025)
by: Jiang, Xin, et al.
Published: (2025)
OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
by: Yu, Hang, et al.
Published: (2025)
by: Yu, Hang, et al.
Published: (2025)
Deep Reversible Consistency Learning for Cross-modal Retrieval
by: Pu, Ruitao, et al.
Published: (2025)
by: Pu, Ruitao, et al.
Published: (2025)
KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection
by: Li, Xingyuan, et al.
Published: (2025)
by: Li, Xingyuan, et al.
Published: (2025)
QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition
by: Song, Youzhe, et al.
Published: (2023)
by: Song, Youzhe, et al.
Published: (2023)
Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2026)
by: Fang, Xiang, et al.
Published: (2026)
Similar Items
-
A Comprehensive Survey on Composed Image Retrieval
by: Song, Xuemeng, et al.
Published: (2025) -
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
by: Lin, Haoqiang, et al.
Published: (2025) -
UniCVR: From Alignment to Reranking for Unified Zero-Shot Composed Visual Retrieval
by: Wen, Haokun, et al.
Published: (2026) -
FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning
by: Wen, Haokun, et al.
Published: (2026) -
HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval
by: Chen, Zhiwei, et al.
Published: (2025)