Saved in:
| Main Authors: | Huang, Xin, Li, Ruibin, Jia, Tong, Zheng, Wei, Wang, Ya |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.15576 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models
by: Pham, Hai X., et al.
Published: (2026)
by: Pham, Hai X., et al.
Published: (2026)
Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition
by: Tang, Wei, et al.
Published: (2025)
by: Tang, Wei, et al.
Published: (2025)
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
by: Chu, Xu, et al.
Published: (2025)
by: Chu, Xu, et al.
Published: (2025)
Semantic Compositions Enhance Vision-Language Contrastive Learning
by: Aladago, Maxwell, et al.
Published: (2024)
by: Aladago, Maxwell, et al.
Published: (2024)
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
by: Lan, Zhibin, et al.
Published: (2025)
by: Lan, Zhibin, et al.
Published: (2025)
Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model
by: Shi, Jiang-Xin, et al.
Published: (2024)
by: Shi, Jiang-Xin, et al.
Published: (2024)
Negative Label Guided OOD Detection with Pretrained Vision-Language Models
by: Jiang, Xue, et al.
Published: (2024)
by: Jiang, Xue, et al.
Published: (2024)
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)
by: Berasi, Davide, et al.
Published: (2025)
ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
by: Zhao, Yanpeng, et al.
Published: (2026)
by: Zhao, Yanpeng, et al.
Published: (2026)
EasyARC: Evaluating Vision Language Models on True Visual Reasoning
by: Unsal, Mert, et al.
Published: (2025)
by: Unsal, Mert, et al.
Published: (2025)
Learning without Forgetting for Vision-Language Models
by: Zhou, Da-Wei, et al.
Published: (2023)
by: Zhou, Da-Wei, et al.
Published: (2023)
Vision-Language Models are Strong Noisy Label Detectors
by: Wei, Tong, et al.
Published: (2024)
by: Wei, Tong, et al.
Published: (2024)
Ultrasound Vision-Language Alignment via Contrastive Learning
by: Lyu, Zhuoyang, et al.
Published: (2026)
by: Lyu, Zhuoyang, et al.
Published: (2026)
Adaptive Global and Fine-Grained Perceptual Fusion for MLLM Embeddings Compatible with Hard Negative Amplification
by: Hu, Lexiang, et al.
Published: (2026)
by: Hu, Lexiang, et al.
Published: (2026)
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
by: Yang, Yi, et al.
Published: (2024)
by: Yang, Yi, et al.
Published: (2024)
Visual Adaptive Prompting for Compositional Zero-Shot Learning
by: Stein, Kyle, et al.
Published: (2025)
by: Stein, Kyle, et al.
Published: (2025)
AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models
by: Zhang, Yabin, et al.
Published: (2024)
by: Zhang, Yabin, et al.
Published: (2024)
Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives
by: Karlov, Mark, et al.
Published: (2024)
by: Karlov, Mark, et al.
Published: (2024)
FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models
by: Huang, Chenyu, et al.
Published: (2026)
by: Huang, Chenyu, et al.
Published: (2026)
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
by: Roy, Shuvendu, et al.
Published: (2024)
by: Roy, Shuvendu, et al.
Published: (2024)
Multi-Label Contrastive Learning for Abstract Visual Reasoning
by: Małkiński, Mikołaj, et al.
Published: (2020)
by: Małkiński, Mikołaj, et al.
Published: (2020)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition
by: Du, Chaoqun, et al.
Published: (2024)
by: Du, Chaoqun, et al.
Published: (2024)
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
by: Chen, Zining, et al.
Published: (2024)
by: Chen, Zining, et al.
Published: (2024)
Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models
by: Lee, Jihoon, et al.
Published: (2025)
by: Lee, Jihoon, et al.
Published: (2025)
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
by: Wu, Shengguang, et al.
Published: (2025)
by: Wu, Shengguang, et al.
Published: (2025)
Adaptive Multi-head Contrastive Learning
by: Wang, Lei, et al.
Published: (2023)
by: Wang, Lei, et al.
Published: (2023)
Improve Multi-Modal Embedding Learning via Explicit Hard Negative Gradient Amplifying
by: Xue, Youze, et al.
Published: (2025)
by: Xue, Youze, et al.
Published: (2025)
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning
by: Tong, Yujia, et al.
Published: (2025)
by: Tong, Yujia, et al.
Published: (2025)
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)
by: Li, Ling, et al.
Published: (2024)
Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
by: Balmaseda, Vicente, et al.
Published: (2025)
by: Balmaseda, Vicente, et al.
Published: (2025)
ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models
by: Chahe, Amirhosein, et al.
Published: (2025)
by: Chahe, Amirhosein, et al.
Published: (2025)
Compositional Entailment Learning for Hyperbolic Vision-Language Models
by: Pal, Avik, et al.
Published: (2024)
by: Pal, Avik, et al.
Published: (2024)
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
by: Pan, Bikang, et al.
Published: (2024)
by: Pan, Bikang, et al.
Published: (2024)
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
by: Jo, Dae Ung, et al.
Published: (2024)
by: Jo, Dae Ung, et al.
Published: (2024)
Personalized Vision via Visual In-Context Learning
by: Jiang, Yuxin, et al.
Published: (2025)
by: Jiang, Yuxin, et al.
Published: (2025)
On the Domain Robustness of Contrastive Vision-Language Models
by: Koddenbrock, Mario, et al.
Published: (2025)
by: Koddenbrock, Mario, et al.
Published: (2025)
Bi-MCQ: Reformulating Vision-Language Alignment for Negation Understanding
by: Kim, Tae Hun, et al.
Published: (2026)
by: Kim, Tae Hun, et al.
Published: (2026)
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
by: Shen, Junhao, et al.
Published: (2025)
by: Shen, Junhao, et al.
Published: (2025)
The Dual Mechanisms of Spatial Reasoning in Vision-Language Models
by: Cui, Kelly, et al.
Published: (2026)
by: Cui, Kelly, et al.
Published: (2026)
Similar Items
-
No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models
by: Pham, Hai X., et al.
Published: (2026) -
Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition
by: Tang, Wei, et al.
Published: (2025) -
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
by: Chu, Xu, et al.
Published: (2025) -
Semantic Compositions Enhance Vision-Language Contrastive Learning
by: Aladago, Maxwell, et al.
Published: (2024) -
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
by: Lan, Zhibin, et al.
Published: (2025)