Saved in:
| Main Authors: | Dordevic, Danilo, Kumar, Suryansh |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.01082 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting Uncertainty: On Evidential Learning for Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026)
by: Li, Jun, et al.
Published: (2026)
FOR: Finetuning for Object Level Open Vocabulary Image Retrieval
by: Levi, Hila, et al.
Published: (2024)
by: Levi, Hila, et al.
Published: (2024)
Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
by: Sogi, Naoya, et al.
Published: (2024)
by: Sogi, Naoya, et al.
Published: (2024)
Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval
by: Lourenço, Vítor N., et al.
Published: (2019)
by: Lourenço, Vítor N., et al.
Published: (2019)
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
by: Liu, Zheyuan, et al.
Published: (2023)
by: Liu, Zheyuan, et al.
Published: (2023)
Multi-event Video-Text Retrieval
by: Zhang, Gengyuan, et al.
Published: (2023)
by: Zhang, Gengyuan, et al.
Published: (2023)
Embedding-based Retrieval in Multimodal Content Moderation
by: Liang, Hanzhong, et al.
Published: (2025)
by: Liang, Hanzhong, et al.
Published: (2025)
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
by: Zhu, Tianyu, et al.
Published: (2024)
by: Zhu, Tianyu, et al.
Published: (2024)
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
by: Liu, Zhuchenyang, et al.
Published: (2026)
by: Liu, Zhuchenyang, et al.
Published: (2026)
DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models
by: Wang, Yimu, et al.
Published: (2024)
by: Wang, Yimu, et al.
Published: (2024)
Online Learning via Memory: Retrieval-Augmented Detector Adaptation
by: Jian, Yanan, et al.
Published: (2024)
by: Jian, Yanan, et al.
Published: (2024)
Metric Compatible Training for Online Backfilling in Large-Scale Retrieval
by: Seo, Seonguk, et al.
Published: (2023)
by: Seo, Seonguk, et al.
Published: (2023)
Open Multimodal Retrieval-Augmented Factual Image Generation
by: Tian, Yang, et al.
Published: (2025)
by: Tian, Yang, et al.
Published: (2025)
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval
by: Xu, Yifan, et al.
Published: (2024)
by: Xu, Yifan, et al.
Published: (2024)
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
by: Alomari, Hani, et al.
Published: (2025)
by: Alomari, Hani, et al.
Published: (2025)
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
by: Zhao, Pengcheng, et al.
Published: (2025)
by: Zhao, Pengcheng, et al.
Published: (2025)
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
by: Sarkar, Rohan, et al.
Published: (2024)
by: Sarkar, Rohan, et al.
Published: (2024)
Re-ranking the Context for Multimodal Retrieval Augmented Generation
by: Mortaheb, Matin, et al.
Published: (2025)
by: Mortaheb, Matin, et al.
Published: (2025)
RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance
by: Mortaheb, Matin, et al.
Published: (2025)
by: Mortaheb, Matin, et al.
Published: (2025)
Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
by: Miyaguchi, Anthony, et al.
Published: (2024)
by: Miyaguchi, Anthony, et al.
Published: (2024)
Multi-Label Plant Species Classification with Self-Supervised Vision Transformers
by: Gustineli, Murilo, et al.
Published: (2024)
by: Gustineli, Murilo, et al.
Published: (2024)
Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models
by: Yada, Yuki, et al.
Published: (2025)
by: Yada, Yuki, et al.
Published: (2025)
PICS: Pipeline for Image Captioning and Search
by: Rosario, Grant, et al.
Published: (2024)
by: Rosario, Grant, et al.
Published: (2024)
Multi-Label Plant Species Prediction with Metadata-Enhanced Multi-Head Vision Transformers
by: Herasimchyk, Hanna, et al.
Published: (2025)
by: Herasimchyk, Hanna, et al.
Published: (2025)
Image Fusion for Cross-Domain Sequential Recommendation
by: Wu, Wangyu, et al.
Published: (2024)
by: Wu, Wangyu, et al.
Published: (2024)
Image Outlier Detection Without Training using RANSAC
by: Tsai, Chen-Han, et al.
Published: (2023)
by: Tsai, Chen-Han, et al.
Published: (2023)
Sustainable transparency in Recommender Systems: Bayesian Ranking of Images for Explainability
by: Paz-Ruza, Jorge, et al.
Published: (2023)
by: Paz-Ruza, Jorge, et al.
Published: (2023)
Image Hashing via Cross-View Code Alignment in the Age of Foundation Models
by: Moummad, Ilyass, et al.
Published: (2025)
by: Moummad, Ilyass, et al.
Published: (2025)
Towards Resource-Efficient Streaming of Large-Scale Medical Image Datasets for Deep Learning
by: Kulkarni, Pranav, et al.
Published: (2023)
by: Kulkarni, Pranav, et al.
Published: (2023)
Understanding the Performance Plateau in Text-to-Video Retrieval: A Comprehensive Empirical and Linguistic Analysis
by: Pegia, Maria-Eirini, et al.
Published: (2026)
by: Pegia, Maria-Eirini, et al.
Published: (2026)
Revisit Anything: Visual Place Recognition via Image Segment Retrieval
by: Garg, Kartik, et al.
Published: (2024)
by: Garg, Kartik, et al.
Published: (2024)
Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models
by: Williams-Lekuona, Mikel, et al.
Published: (2025)
by: Williams-Lekuona, Mikel, et al.
Published: (2025)
Siamese Content-based Search Engine for a More Transparent Skin and Breast Cancer Diagnosis through Histological Imaging
by: Tabatabaei, Zahra, et al.
Published: (2024)
by: Tabatabaei, Zahra, et al.
Published: (2024)
CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing
by: Doan, Khoa D., et al.
Published: (2022)
by: Doan, Khoa D., et al.
Published: (2022)
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval
by: Deanda, Demetrio, et al.
Published: (2025)
by: Deanda, Demetrio, et al.
Published: (2025)
Reasoning-Augmented Representations for Multimodal Retrieval
by: Zhang, Jianrui, et al.
Published: (2026)
by: Zhang, Jianrui, et al.
Published: (2026)
Multimedia-Aware Question Answering: A Review of Retrieval and Cross-Modal Reasoning Architectures
by: Raja, Rahul, et al.
Published: (2025)
by: Raja, Rahul, et al.
Published: (2025)
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
by: Sarwar, Nobin
Published: (2025)
by: Sarwar, Nobin
Published: (2025)
TabRAG: Improving Tabular Document Question Answering for Retrieval Augmented Generation via Structured Representations
by: Si, Jacob, et al.
Published: (2025)
by: Si, Jacob, et al.
Published: (2025)
Similar Items
-
Revisiting Uncertainty: On Evidential Learning for Partially Relevant Video Retrieval
by: Li, Jun, et al.
Published: (2026) -
FOR: Finetuning for Object Level Open Vocabulary Image Retrieval
by: Levi, Hila, et al.
Published: (2024) -
Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
by: Sogi, Naoya, et al.
Published: (2024) -
Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval
by: Lourenço, Vítor N., et al.
Published: (2019) -
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
by: Liu, Zheyuan, et al.
Published: (2023)