:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Xin, Li, Ruibin, Jia, Tong, Zheng, Wei, Wang, Ya
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2505.15576
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models
by: Pham, Hai X., et al.
Published: (2026)

Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition
by: Tang, Wei, et al.
Published: (2025)

Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
by: Chu, Xu, et al.
Published: (2025)

Semantic Compositions Enhance Vision-Language Contrastive Learning
by: Aladago, Maxwell, et al.
Published: (2024)

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
by: Lan, Zhibin, et al.
Published: (2025)

Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model
by: Shi, Jiang-Xin, et al.
Published: (2024)

Negative Label Guided OOD Detection with Pretrained Vision-Language Models
by: Jiang, Xue, et al.
Published: (2024)

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)

ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
by: Zhao, Yanpeng, et al.
Published: (2026)

EasyARC: Evaluating Vision Language Models on True Visual Reasoning
by: Unsal, Mert, et al.
Published: (2025)

Learning without Forgetting for Vision-Language Models
by: Zhou, Da-Wei, et al.
Published: (2023)

Vision-Language Models are Strong Noisy Label Detectors
by: Wei, Tong, et al.
Published: (2024)

Ultrasound Vision-Language Alignment via Contrastive Learning
by: Lyu, Zhuoyang, et al.
Published: (2026)

Adaptive Global and Fine-Grained Perceptual Fusion for MLLM Embeddings Compatible with Hard Negative Amplification
by: Hu, Lexiang, et al.
Published: (2026)

Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
by: Yang, Yi, et al.
Published: (2024)

Visual Adaptive Prompting for Compositional Zero-Shot Learning
by: Stein, Kyle, et al.
Published: (2025)

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models
by: Zhang, Yabin, et al.
Published: (2024)

Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives
by: Karlov, Mark, et al.
Published: (2024)

FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models
by: Huang, Chenyu, et al.
Published: (2026)

Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
by: Roy, Shuvendu, et al.
Published: (2024)

Multi-Label Contrastive Learning for Abstract Visual Reasoning
by: Małkiński, Mikołaj, et al.
Published: (2020)

Probabilistic Contrastive Learning for Long-Tailed Visual Recognition
by: Du, Chaoqun, et al.
Published: (2024)

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
by: Chen, Zining, et al.
Published: (2024)

Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models
by: Lee, Jihoon, et al.
Published: (2025)

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
by: Wu, Shengguang, et al.
Published: (2025)

Adaptive Multi-head Contrastive Learning
by: Wang, Lei, et al.
Published: (2023)

Improve Multi-Modal Embedding Learning via Explicit Hard Negative Gradient Amplifying
by: Xue, Youze, et al.
Published: (2025)

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)

LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning
by: Tong, Yujia, et al.
Published: (2025)

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)

Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
by: Balmaseda, Vicente, et al.
Published: (2025)

ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models
by: Chahe, Amirhosein, et al.
Published: (2025)

Compositional Entailment Learning for Hyperbolic Vision-Language Models
by: Pal, Avik, et al.
Published: (2024)

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
by: Pan, Bikang, et al.
Published: (2024)

Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
by: Jo, Dae Ung, et al.
Published: (2024)

Personalized Vision via Visual In-Context Learning
by: Jiang, Yuxin, et al.
Published: (2025)

On the Domain Robustness of Contrastive Vision-Language Models
by: Koddenbrock, Mario, et al.
Published: (2025)

Bi-MCQ: Reformulating Vision-Language Alignment for Negation Understanding
by: Kim, Tae Hun, et al.
Published: (2026)

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
by: Shen, Junhao, et al.
Published: (2025)

The Dual Mechanisms of Spatial Reasoning in Vision-Language Models
by: Cui, Kelly, et al.
Published: (2026)