:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zaazou, Youssef, Thomas, Mark
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.11107
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reliability Estimation of News Media Sources: Birds of a Feather Flock Together
by: Burdisso, Sergio, et al.
Published: (2024)

Same Answer, Different Representations: Hidden instability in VLMs
by: Wani, Farooq Ahmad, et al.
Published: (2026)

Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations
by: Mohammed, Fatma Youssef, et al.
Published: (2025)

What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
by: Kang, Inha, et al.
Published: (2025)

Rethink MAE with Linear Time-Invariant Dynamics
by: Wang, Zice
Published: (2026)

Fine-grained Background Representation for Weakly Supervised Semantic Segmentation
by: Yin, Xu, et al.
Published: (2024)

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)

VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT
by: Kang, Hyeonsu, et al.
Published: (2025)

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
by: Tan, Zhaorui, et al.
Published: (2024)

Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
by: Li, Yuxin, et al.
Published: (2024)

Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
by: Chen, Yanlong, et al.
Published: (2026)

CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization
by: Liang, Yue, et al.
Published: (2026)

Caption This, Reason That: VLMs Caught in the Middle
by: Weng, Zihan, et al.
Published: (2025)

Evaluating Compositional Generalisation in VLMs and Diffusion Models
by: Pearson, Beth, et al.
Published: (2025)

VACoT: Rethinking Visual Data Augmentation with VLMs
by: Xu, Zhengzhuo, et al.
Published: (2025)

Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)

Line of Sight: On Linear Representations in VLLMs
by: Rajaram, Achyuta, et al.
Published: (2025)

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
by: Ma, Jiawei, et al.
Published: (2024)

Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization
by: Lan, Yuqin, et al.
Published: (2026)

SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
by: Zhang, Yuyou, et al.
Published: (2025)

Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming
by: Cohen, Myke C., et al.
Published: (2025)

Stateful Token Reduction for Long-Video Hybrid VLMs
by: Jiang, Jindong, et al.
Published: (2026)

MolmoPoint: Better Pointing for VLMs with Grounding Tokens
by: Clark, Christopher, et al.
Published: (2026)

Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
by: Ge, Yuyao, et al.
Published: (2025)

Towards Lossless Ultimate Vision Token Compression for VLMs
by: Zheng, Dehua, et al.
Published: (2025)

Treble Counterfactual VLMs: A Causal Approach to Hallucination
by: Li, Shawn, et al.
Published: (2025)

WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift
by: Santamaria, Julian D., et al.
Published: (2026)

Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation
by: Zhu, Chunzheng, et al.
Published: (2026)

Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
by: Izadi, Amirmohammad, et al.
Published: (2025)

D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
by: Huang, Yiyang, et al.
Published: (2025)

3D Primitives are a Spatial Language for VLMs
by: Liu, Junze, et al.
Published: (2026)

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
by: Khan, Ufaq, et al.
Published: (2026)

To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs
by: Hong, Rui, et al.
Published: (2026)

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
by: Li, Chenjun
Published: (2026)

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)

TAPS : Frustratingly Simple Test Time Active Learning for VLMs
by: Sarkar, Dhruv, et al.
Published: (2025)

DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
by: Wang, Zhenhailong, et al.
Published: (2025)

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
by: Park, Jaehyun, et al.
Published: (2026)

T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs
by: Xia, Shao-Jun, et al.
Published: (2025)

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs
by: Avogaro, Niccolo, et al.
Published: (2026)