:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ypsilantis, Nikolaos-Antonios, Chen, Kaifeng, Araujo, André, Chum, Ondřej
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.12137
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UDON: Universal Dynamic Online distillatioN for generic image representations
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)

Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2024)

ILIAS: Instance-Level Image retrieval At Scale
by: Kordopatis-Zilos, Giorgos, et al.
Published: (2025)

Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning
by: Mohwald, Albert, et al.
Published: (2023)

Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization
by: Efthymiadis, Nikos, et al.
Published: (2024)

InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
by: Zhou, Shuchang, et al.
Published: (2025)

Composed Image Retrieval for Training-Free Domain Conversion
by: Efthymiadis, Nikos, et al.
Published: (2024)

Composed Image Retrieval for Remote Sensing
by: Psomas, Bill, et al.
Published: (2024)

Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking
by: Aiger, Dror, et al.
Published: (2025)

CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization
by: Kritikos, Antonios, et al.
Published: (2026)

Instance-Level Composed Image Retrieval
by: Psomas, Bill, et al.
Published: (2025)

Learning Vision from Models Rivals Learning Vision from Data
by: Tian, Yonglong, et al.
Published: (2023)

Visual RAG: Expanding MLLM visual knowledge without fine-tuning
by: Bonomo, Mirco, et al.
Published: (2025)

Benchmarking Composed Image Retrieval for Applied Earth Observation
by: Psomas, Bill, et al.
Published: (2026)

Demographic-aware fine-grained visual recognition of pediatric wrist pathologies
by: Ahmed, Ammar, et al.
Published: (2025)

Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
by: Wang, Chaoyang, et al.
Published: (2025)

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
by: Aiger, Dror, et al.
Published: (2023)

FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
by: Jing, Liqiang, et al.
Published: (2023)

Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)

Koo-Fu CLIP: Closed-Form Adaptation of Vision-Language Models via Fukunaga-Koontz Linear Discriminant Analysis
by: Suchanek, Matej, et al.
Published: (2026)

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment
by: Cao, Bingyi, et al.
Published: (2026)

Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models
by: Sun, Yuhao, et al.
Published: (2026)

LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
by: Gkalelis, Nikolaos, et al.
Published: (2026)

Adapting Vision-Language Model with Fine-grained Semantics for Open-Vocabulary Segmentation
by: Chng, Yong Xien, et al.
Published: (2024)

Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces
by: Chen, Zhiling, et al.
Published: (2024)

Large Language Models estimate fine-grained human color-concept associations
by: Mukherjee, Kushin, et al.
Published: (2024)

An Inpainting-Infused Pipeline for Attire and Background Replacement
by: Perche-Mahlow, Felipe Rodrigues, et al.
Published: (2024)

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation
by: Chen, Zhiqin, et al.
Published: (2023)

Is CLIP the main roadblock for fine-grained open-world perception?
by: Bianchi, Lorenzo, et al.
Published: (2024)

fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
by: Sharma, Saurav, et al.
Published: (2025)

Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation
by: Zhang, Wenyao, et al.
Published: (2025)

CARE: Confidence-Aware Regression Estimation of building density fine-tuning EO Foundation Models
by: Dionelis, Nikolaos, et al.
Published: (2025)

Context-Infused Visual Grounding for Art
by: Khan, Selina, et al.
Published: (2024)

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
by: Wang, Ruiyu, et al.
Published: (2025)

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
by: Thoker, Fida Mohammad, et al.
Published: (2025)

Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
by: Liu, Ying, et al.
Published: (2024)

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
by: Jing, Liqiang, et al.
Published: (2024)

Specificity-aware reinforcement learning for fine-grained open-world classification
by: Angheben, Samuele, et al.
Published: (2026)

SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future
by: Higgins, Timothy B., et al.
Published: (2026)

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
by: Hong, Wenyi, et al.
Published: (2025)