Saved in:
| Main Authors: | Khan, Faizan Farooq, Stojnić, Vladan, Laskar, Zakaria, Elhoseiny, Mohamed, Tolias, Giorgos |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.00177 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Label Propagation for Zero-shot Classification with Vision-Language Models
by: Stojnić, Vladan, et al.
Published: (2024)
by: Stojnić, Vladan, et al.
Published: (2024)
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?
by: Aravanis, Tilemachos, et al.
Published: (2026)
by: Aravanis, Tilemachos, et al.
Published: (2026)
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
by: Stojnić, Vladan, et al.
Published: (2025)
by: Stojnić, Vladan, et al.
Published: (2025)
ILIAS: Instance-Level Image retrieval At Scale
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025)
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025)
Composed Image Retrieval for Training-Free Domain Conversion
by: Efthymiadis, Nikos, et al.
Published: (2024)
by: Efthymiadis, Nikos, et al.
Published: (2024)
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
by: Ramos, Ryan, et al.
Published: (2025)
by: Ramos, Ryan, et al.
Published: (2025)
Instance-Level Generation for Representation Learning
by: Wu, Yankun, et al.
Published: (2025)
by: Wu, Yankun, et al.
Published: (2025)
How Well Can Vision Language Models See Image Details?
by: Gou, Chenhui, et al.
Published: (2024)
by: Gou, Chenhui, et al.
Published: (2024)
Neural Catalog: Scaling Species Recognition with Catalog of Life-Augmented Generation
by: Khan, Faizan Farooq, et al.
Published: (2025)
by: Khan, Faizan Farooq, et al.
Published: (2025)
AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
by: Suma, Pavel, et al.
Published: (2024)
by: Suma, Pavel, et al.
Published: (2024)
Step-by-step Layered Design Generation
by: Khan, Faizan Farooq, et al.
Published: (2025)
by: Khan, Faizan Farooq, et al.
Published: (2025)
Indexing Multimodal Language Models for Large-scale Image Retrieval
by: Tharwat, Bahey, et al.
Published: (2026)
by: Tharwat, Bahey, et al.
Published: (2026)
Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization
by: Efthymiadis, Nikos, et al.
Published: (2024)
by: Efthymiadis, Nikos, et al.
Published: (2024)
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
by: Suma, Pavel, et al.
Published: (2026)
by: Suma, Pavel, et al.
Published: (2026)
AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art
by: Khan, Faizan Farooq, et al.
Published: (2024)
by: Khan, Faizan Farooq, et al.
Published: (2024)
A Dataset for Semantic Segmentation in the Presence of Unknowns
by: Laskar, Zakaria, et al.
Published: (2025)
by: Laskar, Zakaria, et al.
Published: (2025)
Composed Image Retrieval for Remote Sensing
by: Psomas, Bill, et al.
Published: (2024)
by: Psomas, Bill, et al.
Published: (2024)
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
by: Abdelrahman, Eslam, et al.
Published: (2023)
by: Abdelrahman, Eslam, et al.
Published: (2023)
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
by: Khan, Faizan Farooq, et al.
Published: (2025)
by: Khan, Faizan Farooq, et al.
Published: (2025)
Instance-Level Composed Image Retrieval
by: Psomas, Bill, et al.
Published: (2025)
by: Psomas, Bill, et al.
Published: (2025)
CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery
by: Rongali, Sai Bhargav, et al.
Published: (2024)
by: Rongali, Sai Bhargav, et al.
Published: (2024)
Can Diffusion Models Bridge the Domain Gap in Cardiac MR Imaging?
by: Wong, Xin Ci, et al.
Published: (2025)
by: Wong, Xin Ci, et al.
Published: (2025)
LOCORE: Image Re-ranking with Long-Context Sequence Modeling
by: Xiao, Zilin, et al.
Published: (2025)
by: Xiao, Zilin, et al.
Published: (2025)
Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback
by: Niu, Xuexiang, et al.
Published: (2024)
by: Niu, Xuexiang, et al.
Published: (2024)
Three Things to Know about Deep Metric Learning
by: Patel, Yash, et al.
Published: (2024)
by: Patel, Yash, et al.
Published: (2024)
StoryGPT-V: Large Language Models as Consistent Story Visualizers
by: Shen, Xiaoqian, et al.
Published: (2023)
by: Shen, Xiaoqian, et al.
Published: (2023)
Benchmarking Composed Image Retrieval for Applied Earth Observation
by: Psomas, Bill, et al.
Published: (2026)
by: Psomas, Bill, et al.
Published: (2026)
PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education
by: M, Megha Mariam K., et al.
Published: (2026)
by: M, Megha Mariam K., et al.
Published: (2026)
Scaling Down Text Encoders of Text-to-Image Diffusion Models
by: Wang, Lifu, et al.
Published: (2025)
by: Wang, Lifu, et al.
Published: (2025)
Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
by: Kim, Mingyeong, et al.
Published: (2026)
by: Kim, Mingyeong, et al.
Published: (2026)
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
by: Zhuo, Le, et al.
Published: (2025)
by: Zhuo, Le, et al.
Published: (2025)
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)
by: Yamabe, Shojiro, et al.
Published: (2025)
Domain-Aware Continual Zero-Shot Learning
by: Yi, Kai, et al.
Published: (2021)
by: Yi, Kai, et al.
Published: (2021)
SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation
by: Kombol, Naomi, et al.
Published: (2026)
by: Kombol, Naomi, et al.
Published: (2026)
Global-Aware Edge Prioritization for Pose Graph Initialization
by: Wei, Tong, et al.
Published: (2026)
by: Wei, Tong, et al.
Published: (2026)
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
by: Shen, Xiaoqian, et al.
Published: (2025)
by: Shen, Xiaoqian, et al.
Published: (2025)
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
by: Chen, Jun, et al.
Published: (2022)
by: Chen, Jun, et al.
Published: (2022)
Bridging the Domain Gap for Flight-Ready Spaceborne Vision
by: Park, Tae Ha, et al.
Published: (2024)
by: Park, Tae Ha, et al.
Published: (2024)
One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
by: Li, Senmao, et al.
Published: (2025)
by: Li, Senmao, et al.
Published: (2025)
Similar Items
-
Label Propagation for Zero-shot Classification with Vision-Language Models
by: Stojnić, Vladan, et al.
Published: (2024) -
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?
by: Aravanis, Tilemachos, et al.
Published: (2026) -
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
by: Stojnić, Vladan, et al.
Published: (2025) -
ILIAS: Instance-Level Image retrieval At Scale
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025) -
Composed Image Retrieval for Training-Free Domain Conversion
by: Efthymiadis, Nikos, et al.
Published: (2024)