Saved in:
| Main Authors: | Mahaut, Matéo, Baroni, Marco |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Referential communication in heterogeneous communities of pre-trained visual deep networks
by: Mahaut, Matéo, et al.
Published: (2023)
by: Mahaut, Matéo, et al.
Published: (2023)
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024)
by: Xu, Guowei, et al.
Published: (2024)
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
by: Liu, Yong, et al.
Published: (2025)
by: Liu, Yong, et al.
Published: (2025)
When Does Pruning Benefit Vision Representations?
by: Cassano, Enrico, et al.
Published: (2025)
by: Cassano, Enrico, et al.
Published: (2025)
Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation
by: Zu, Wenqiang, et al.
Published: (2025)
by: Zu, Wenqiang, et al.
Published: (2025)
Concept Unlearning by Modeling Key Steps of Diffusion Process
by: Zhang, Chaoshuo, et al.
Published: (2025)
by: Zhang, Chaoshuo, et al.
Published: (2025)
The Geometry of Representational Failures in Vision Language Models
by: Savietto, Daniele, et al.
Published: (2026)
by: Savietto, Daniele, et al.
Published: (2026)
Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models
by: Ma, Kexin, et al.
Published: (2026)
by: Ma, Kexin, et al.
Published: (2026)
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)
by: Chen, Honghao, et al.
Published: (2025)
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
by: Xue, Chaocan, et al.
Published: (2025)
by: Xue, Chaocan, et al.
Published: (2025)
Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments
by: Li, Haoyuan, et al.
Published: (2026)
by: Li, Haoyuan, et al.
Published: (2026)
Uncovering Cultural Representation Disparities in Vision-Language Models
by: Kadiyala, Ram Mohan Rao, et al.
Published: (2025)
by: Kadiyala, Ram Mohan Rao, et al.
Published: (2025)
Comparing Computational Pathology Foundation Models using Representational Similarity Analysis
by: Mishra, Vaibhav, et al.
Published: (2025)
by: Mishra, Vaibhav, et al.
Published: (2025)
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
by: Luo, Yulin, et al.
Published: (2026)
by: Luo, Yulin, et al.
Published: (2026)
MSDS: Deep Structural Similarity with Multiscale Representation
by: Kang, Danling, et al.
Published: (2026)
by: Kang, Danling, et al.
Published: (2026)
Variational Adapter for Cross-modal Similarity Representation
by: Wei, WenZhang, et al.
Published: (2026)
by: Wei, WenZhang, et al.
Published: (2026)
Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing
by: Wang, Runjie, et al.
Published: (2026)
by: Wang, Runjie, et al.
Published: (2026)
Hierarchical Process Reward Models are Symbolic Vision Learners
by: Zhang, Shan, et al.
Published: (2025)
by: Zhang, Shan, et al.
Published: (2025)
Law of Vision Representation in MLLMs
by: Yang, Shijia, et al.
Published: (2024)
by: Yang, Shijia, et al.
Published: (2024)
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
by: Pan, Qingtao, et al.
Published: (2024)
by: Pan, Qingtao, et al.
Published: (2024)
Vision-based Vehicle Re-identification in Bridge Scenario using Flock Similarity
by: Zhang, Chunfeng, et al.
Published: (2024)
by: Zhang, Chunfeng, et al.
Published: (2024)
Improving Vision-language Models with Perception-centric Process Reward Models
by: Min, Yingqian, et al.
Published: (2026)
by: Min, Yingqian, et al.
Published: (2026)
Sparsity Meets Similarity: Leveraging Long-Tail Distribution for Dynamic Optimized Token Representation in Multimodal Large Language Models
by: Yu, Gaotong, et al.
Published: (2024)
by: Yu, Gaotong, et al.
Published: (2024)
Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models
by: Nguyen, Minh Khoi, et al.
Published: (2026)
by: Nguyen, Minh Khoi, et al.
Published: (2026)
MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs
by: Sobotka, Jan, et al.
Published: (2025)
by: Sobotka, Jan, et al.
Published: (2025)
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
by: Xia, Peng, et al.
Published: (2023)
by: Xia, Peng, et al.
Published: (2023)
Do Vision Language Models Need to Process Image Tokens?
by: Ghosh, Sambit, et al.
Published: (2026)
by: Ghosh, Sambit, et al.
Published: (2026)
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
by: Pyo, Jiyoon, et al.
Published: (2025)
by: Pyo, Jiyoon, et al.
Published: (2025)
Case-Enhanced Vision Transformer: Improving Explanations of Image Similarity with a ViT-based Similarity Metric
by: Zhao, Ziwei, et al.
Published: (2024)
by: Zhao, Ziwei, et al.
Published: (2024)
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
by: Zhang, Yuan, et al.
Published: (2026)
by: Zhang, Yuan, et al.
Published: (2026)
TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
by: Zhuang, Shaobin, et al.
Published: (2025)
by: Zhuang, Shaobin, et al.
Published: (2025)
Motion-Enhanced Nonlocal Similarity Implicit Neural Representation for Infrared Dim and Small Target Detection
by: Liu, Pei, et al.
Published: (2025)
by: Liu, Pei, et al.
Published: (2025)
SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training
by: Wang, Rongsheng, et al.
Published: (2025)
by: Wang, Rongsheng, et al.
Published: (2025)
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
by: Min, Cheolhong, et al.
Published: (2026)
by: Min, Cheolhong, et al.
Published: (2026)
From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model
by: Hu, Bing, et al.
Published: (2026)
by: Hu, Bing, et al.
Published: (2026)
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
by: Khan, Md Azim, et al.
Published: (2025)
by: Khan, Md Azim, et al.
Published: (2025)
Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models
by: Zhang, Enming, et al.
Published: (2024)
by: Zhang, Enming, et al.
Published: (2024)
MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model
by: Wang, Xinyang, et al.
Published: (2024)
by: Wang, Xinyang, et al.
Published: (2024)
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
by: Guo, Yuncheng, et al.
Published: (2025)
by: Guo, Yuncheng, et al.
Published: (2025)
Similar Items
-
Referential communication in heterogeneous communities of pre-trained visual deep networks
by: Mahaut, Matéo, et al.
Published: (2023) -
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
by: Xu, Guowei, et al.
Published: (2024) -
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
by: Liu, Yong, et al.
Published: (2025) -
When Does Pruning Benefit Vision Representations?
by: Cassano, Enrico, et al.
Published: (2025) -
Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation
by: Zu, Wenqiang, et al.
Published: (2025)