:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khan, Faizan Farooq, Stojnić, Vladan, Laskar, Zakaria, Elhoseiny, Mohamed, Tolias, Giorgos
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.00177
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Label Propagation for Zero-shot Classification with Vision-Language Models
by: Stojnić, Vladan, et al.
Published: (2024)

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?
by: Aravanis, Tilemachos, et al.
Published: (2026)

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
by: Stojnić, Vladan, et al.
Published: (2025)

ILIAS: Instance-Level Image retrieval At Scale
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025)

Composed Image Retrieval for Training-Free Domain Conversion
by: Efthymiadis, Nikos, et al.
Published: (2024)

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
by: Ramos, Ryan, et al.
Published: (2025)

Instance-Level Generation for Representation Learning
by: Wu, Yankun, et al.
Published: (2025)

How Well Can Vision Language Models See Image Details?
by: Gou, Chenhui, et al.
Published: (2024)

Neural Catalog: Scaling Species Recognition with Catalog of Life-Augmented Generation
by: Khan, Faizan Farooq, et al.
Published: (2025)

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
by: Suma, Pavel, et al.
Published: (2024)

Step-by-step Layered Design Generation
by: Khan, Faizan Farooq, et al.
Published: (2025)

Indexing Multimodal Language Models for Large-scale Image Retrieval
by: Tharwat, Bahey, et al.
Published: (2026)

Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization
by: Efthymiadis, Nikos, et al.
Published: (2024)

ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
by: Suma, Pavel, et al.
Published: (2026)

AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art
by: Khan, Faizan Farooq, et al.
Published: (2024)

A Dataset for Semantic Segmentation in the Presence of Unknowns
by: Laskar, Zakaria, et al.
Published: (2025)

Composed Image Retrieval for Remote Sensing
by: Psomas, Bill, et al.
Published: (2024)

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
by: Abdelrahman, Eslam, et al.
Published: (2023)

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
by: Li, Xiang, et al.
Published: (2024)

FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
by: Khan, Faizan Farooq, et al.
Published: (2025)

Instance-Level Composed Image Retrieval
by: Psomas, Bill, et al.
Published: (2025)

CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery
by: Rongali, Sai Bhargav, et al.
Published: (2024)

Can Diffusion Models Bridge the Domain Gap in Cardiac MR Imaging?
by: Wong, Xin Ci, et al.
Published: (2025)

LOCORE: Image Re-ranking with Long-Context Sequence Modeling
by: Xiao, Zilin, et al.
Published: (2025)

Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback
by: Niu, Xuexiang, et al.
Published: (2024)

Three Things to Know about Deep Metric Learning
by: Patel, Yash, et al.
Published: (2024)

StoryGPT-V: Large Language Models as Consistent Story Visualizers
by: Shen, Xiaoqian, et al.
Published: (2023)

Benchmarking Composed Image Retrieval for Applied Earth Observation
by: Psomas, Bill, et al.
Published: (2026)

PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education
by: M, Megha Mariam K., et al.
Published: (2026)

Scaling Down Text Encoders of Text-to-Image Diffusion Models
by: Wang, Lifu, et al.
Published: (2025)

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
by: Kim, Mingyeong, et al.
Published: (2026)

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
by: Zhuo, Le, et al.
Published: (2025)

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)

Domain-Aware Continual Zero-Shot Learning
by: Yi, Kai, et al.
Published: (2021)

SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation
by: Kombol, Naomi, et al.
Published: (2026)

Global-Aware Edge Prioritization for Pose Graph Initialization
by: Wei, Tong, et al.
Published: (2026)

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
by: Shen, Xiaoqian, et al.
Published: (2025)

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
by: Chen, Jun, et al.
Published: (2022)

Bridging the Domain Gap for Flight-Ready Spaceborne Vision
by: Park, Tae Ha, et al.
Published: (2024)

One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
by: Li, Senmao, et al.
Published: (2025)