Saved in:
| Main Authors: | Ypsilantis, Nikolaos-Antonios, Chen, Kaifeng, Araujo, André, Chum, Ondřej |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.12137 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UDON: Universal Dynamic Online distillatioN for generic image representations
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)
Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)
ILIAS: Instance-Level Image retrieval At Scale
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025)
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025)
Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning
by: Mohwald, Albert, et al.
Published: (2023)
by: Mohwald, Albert, et al.
Published: (2023)
Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization
by: Efthymiadis, Nikos, et al.
Published: (2024)
by: Efthymiadis, Nikos, et al.
Published: (2024)
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
by: Zhou, Shuchang, et al.
Published: (2025)
by: Zhou, Shuchang, et al.
Published: (2025)
Composed Image Retrieval for Training-Free Domain Conversion
by: Efthymiadis, Nikos, et al.
Published: (2024)
by: Efthymiadis, Nikos, et al.
Published: (2024)
Composed Image Retrieval for Remote Sensing
by: Psomas, Bill, et al.
Published: (2024)
by: Psomas, Bill, et al.
Published: (2024)
Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking
by: Aiger, Dror, et al.
Published: (2025)
by: Aiger, Dror, et al.
Published: (2025)
CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization
by: Kritikos, Antonios, et al.
Published: (2026)
by: Kritikos, Antonios, et al.
Published: (2026)
Instance-Level Composed Image Retrieval
by: Psomas, Bill, et al.
Published: (2025)
by: Psomas, Bill, et al.
Published: (2025)
Learning Vision from Models Rivals Learning Vision from Data
by: Tian, Yonglong, et al.
Published: (2023)
by: Tian, Yonglong, et al.
Published: (2023)
Visual RAG: Expanding MLLM visual knowledge without fine-tuning
by: Bonomo, Mirco, et al.
Published: (2025)
by: Bonomo, Mirco, et al.
Published: (2025)
Benchmarking Composed Image Retrieval for Applied Earth Observation
by: Psomas, Bill, et al.
Published: (2026)
by: Psomas, Bill, et al.
Published: (2026)
Demographic-aware fine-grained visual recognition of pediatric wrist pathologies
by: Ahmed, Ammar, et al.
Published: (2025)
by: Ahmed, Ammar, et al.
Published: (2025)
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
by: Wang, Chaoyang, et al.
Published: (2025)
by: Wang, Chaoyang, et al.
Published: (2025)
Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
by: Aiger, Dror, et al.
Published: (2023)
by: Aiger, Dror, et al.
Published: (2023)
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
by: Jing, Liqiang, et al.
Published: (2023)
by: Jing, Liqiang, et al.
Published: (2023)
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)
by: Chen, Honghao, et al.
Published: (2025)
Koo-Fu CLIP: Closed-Form Adaptation of Vision-Language Models via Fukunaga-Koontz Linear Discriminant Analysis
by: Suchanek, Matej, et al.
Published: (2026)
by: Suchanek, Matej, et al.
Published: (2026)
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment
by: Cao, Bingyi, et al.
Published: (2026)
by: Cao, Bingyi, et al.
Published: (2026)
Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models
by: Sun, Yuhao, et al.
Published: (2026)
by: Sun, Yuhao, et al.
Published: (2026)
LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
by: Gkalelis, Nikolaos, et al.
Published: (2026)
by: Gkalelis, Nikolaos, et al.
Published: (2026)
Adapting Vision-Language Model with Fine-grained Semantics for Open-Vocabulary Segmentation
by: Chng, Yong Xien, et al.
Published: (2024)
by: Chng, Yong Xien, et al.
Published: (2024)
Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces
by: Chen, Zhiling, et al.
Published: (2024)
by: Chen, Zhiling, et al.
Published: (2024)
Large Language Models estimate fine-grained human color-concept associations
by: Mukherjee, Kushin, et al.
Published: (2024)
by: Mukherjee, Kushin, et al.
Published: (2024)
An Inpainting-Infused Pipeline for Attire and Background Replacement
by: Perche-Mahlow, Felipe Rodrigues, et al.
Published: (2024)
by: Perche-Mahlow, Felipe Rodrigues, et al.
Published: (2024)
DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation
by: Chen, Zhiqin, et al.
Published: (2023)
by: Chen, Zhiqin, et al.
Published: (2023)
Is CLIP the main roadblock for fine-grained open-world perception?
by: Bianchi, Lorenzo, et al.
Published: (2024)
by: Bianchi, Lorenzo, et al.
Published: (2024)
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)
by: Sharma, Saurav, et al.
Published: (2025)
Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
by: Zhang, Wenyao, et al.
Published: (2025)
by: Zhang, Wenyao, et al.
Published: (2025)
CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models
by: Dionelis, Nikolaos, et al.
Published: (2025)
by: Dionelis, Nikolaos, et al.
Published: (2025)
Context-Infused Visual Grounding for Art
by: Khan, Selina, et al.
Published: (2024)
by: Khan, Selina, et al.
Published: (2024)
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
by: Wang, Ruiyu, et al.
Published: (2025)
by: Wang, Ruiyu, et al.
Published: (2025)
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
by: Thoker, Fida Mohammad, et al.
Published: (2025)
by: Thoker, Fida Mohammad, et al.
Published: (2025)
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
by: Liu, Ying, et al.
Published: (2024)
by: Liu, Ying, et al.
Published: (2024)
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
by: Jing, Liqiang, et al.
Published: (2024)
by: Jing, Liqiang, et al.
Published: (2024)
Specificity-aware reinforcement learning for fine-grained open-world classification
by: Angheben, Samuele, et al.
Published: (2026)
by: Angheben, Samuele, et al.
Published: (2026)
SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future
by: Higgins, Timothy B., et al.
Published: (2026)
by: Higgins, Timothy B., et al.
Published: (2026)
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
by: Hong, Wenyi, et al.
Published: (2025)
by: Hong, Wenyi, et al.
Published: (2025)
Similar Items
-
UDON: Universal Dynamic Online distillatioN for generic image representations
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024) -
Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024) -
ILIAS: Instance-Level Image retrieval At Scale
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025) -
Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning
by: Mohwald, Albert, et al.
Published: (2023) -
Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization
by: Efthymiadis, Nikos, et al.
Published: (2024)