:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mahaut, Matéo, Baroni, Marco
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.21621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Referential communication in heterogeneous communities of pre-trained visual deep networks
by: Mahaut, Matéo, et al.
Published: (2023)

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
by: Liu, Yong, et al.
Published: (2025)

When Does Pruning Benefit Vision Representations?
by: Cassano, Enrico, et al.
Published: (2025)

Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation
by: Zu, Wenqiang, et al.
Published: (2025)

Concept Unlearning by Modeling Key Steps of Diffusion Process
by: Zhang, Chaoshuo, et al.
Published: (2025)

The Geometry of Representational Failures in Vision Language Models
by: Savietto, Daniele, et al.
Published: (2026)

Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models
by: Ma, Kexin, et al.
Published: (2026)

CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)

Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)

Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
by: Xue, Chaocan, et al.
Published: (2025)

Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments
by: Li, Haoyuan, et al.
Published: (2026)

Uncovering Cultural Representation Disparities in Vision-Language Models
by: Kadiyala, Ram Mohan Rao, et al.
Published: (2025)

Comparing Computational Pathology Foundation Models using Representational Similarity Analysis
by: Mishra, Vaibhav, et al.
Published: (2025)

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
by: Luo, Yulin, et al.
Published: (2026)

MSDS: Deep Structural Similarity with Multiscale Representation
by: Kang, Danling, et al.
Published: (2026)

Variational Adapter for Cross-modal Similarity Representation
by: Wei, WenZhang, et al.
Published: (2026)

Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing
by: Wang, Runjie, et al.
Published: (2026)

Hierarchical Process Reward Models are Symbolic Vision Learners
by: Zhang, Shan, et al.
Published: (2025)

Law of Vision Representation in MLLMs
by: Yang, Shijia, et al.
Published: (2024)

DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
by: Pan, Qingtao, et al.
Published: (2024)

Vision-based Vehicle Re-identification in Bridge Scenario using Flock Similarity
by: Zhang, Chunfeng, et al.
Published: (2024)

Improving Vision-language Models with Perception-centric Process Reward Models
by: Min, Yingqian, et al.
Published: (2026)

Sparsity Meets Similarity: Leveraging Long-Tail Distribution for Dynamic Optimized Token Representation in Multimodal Large Language Models
by: Yu, Gaotong, et al.
Published: (2024)

Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models
by: Nguyen, Minh Khoi, et al.
Published: (2026)

MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs
by: Sobotka, Jan, et al.
Published: (2025)

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
by: Xia, Peng, et al.
Published: (2023)

Do Vision Language Models Need to Process Image Tokens?
by: Ghosh, Sambit, et al.
Published: (2026)

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
by: Pyo, Jiyoon, et al.
Published: (2025)

Case-Enhanced Vision Transformer: Improving Explanations of Image Similarity with a ViT-based Similarity Metric
by: Zhao, Ziwei, et al.
Published: (2024)

Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
by: Zhang, Yuan, et al.
Published: (2026)

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
by: Zhuang, Shaobin, et al.
Published: (2025)

Motion-Enhanced Nonlocal Similarity Implicit Neural Representation for Infrared Dim and Small Target Detection
by: Liu, Pei, et al.
Published: (2025)

SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training
by: Wang, Rongsheng, et al.
Published: (2025)

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
by: Min, Cheolhong, et al.
Published: (2026)

From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model
by: Hu, Bing, et al.
Published: (2026)

Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
by: Khan, Md Azim, et al.
Published: (2025)

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models
by: Zhang, Enming, et al.
Published: (2024)

MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model
by: Wang, Xinyang, et al.
Published: (2024)

MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
by: Guo, Yuncheng, et al.
Published: (2025)