Saved in:
| Main Authors: | Zaazou, Youssef, Thomas, Mark |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11107 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reliability Estimation of News Media Sources: Birds of a Feather Flock Together
by: Burdisso, Sergio, et al.
Published: (2024)
by: Burdisso, Sergio, et al.
Published: (2024)
Same Answer, Different Representations: Hidden instability in VLMs
by: Wani, Farooq Ahmad, et al.
Published: (2026)
by: Wani, Farooq Ahmad, et al.
Published: (2026)
Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations
by: Mohammed, Fatma Youssef, et al.
Published: (2025)
by: Mohammed, Fatma Youssef, et al.
Published: (2025)
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
by: Kang, Inha, et al.
Published: (2025)
by: Kang, Inha, et al.
Published: (2025)
Rethink MAE with Linear Time-Invariant Dynamics
by: Wang, Zice
Published: (2026)
by: Wang, Zice
Published: (2026)
Fine-grained Background Representation for Weakly Supervised Semantic Segmentation
by: Yin, Xu, et al.
Published: (2024)
by: Yin, Xu, et al.
Published: (2024)
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)
by: Berman, Shmuel, et al.
Published: (2025)
VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT
by: Kang, Hyeonsu, et al.
Published: (2025)
by: Kang, Hyeonsu, et al.
Published: (2025)
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
by: Tan, Zhaorui, et al.
Published: (2024)
by: Tan, Zhaorui, et al.
Published: (2024)
Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
by: Li, Yuxin, et al.
Published: (2024)
by: Li, Yuxin, et al.
Published: (2024)
Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
by: Chen, Yanlong, et al.
Published: (2026)
by: Chen, Yanlong, et al.
Published: (2026)
CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization
by: Liang, Yue, et al.
Published: (2026)
by: Liang, Yue, et al.
Published: (2026)
Caption This, Reason That: VLMs Caught in the Middle
by: Weng, Zihan, et al.
Published: (2025)
by: Weng, Zihan, et al.
Published: (2025)
Evaluating Compositional Generalisation in VLMs and Diffusion Models
by: Pearson, Beth, et al.
Published: (2025)
by: Pearson, Beth, et al.
Published: (2025)
VACoT: Rethinking Visual Data Augmentation with VLMs
by: Xu, Zhengzhuo, et al.
Published: (2025)
by: Xu, Zhengzhuo, et al.
Published: (2025)
Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)
by: Gambashidze, Alexander, et al.
Published: (2025)
Line of Sight: On Linear Representations in VLLMs
by: Rajaram, Achyuta, et al.
Published: (2025)
by: Rajaram, Achyuta, et al.
Published: (2025)
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
by: Ma, Jiawei, et al.
Published: (2024)
by: Ma, Jiawei, et al.
Published: (2024)
Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization
by: Lan, Yuqin, et al.
Published: (2026)
by: Lan, Yuqin, et al.
Published: (2026)
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
by: Zhang, Yuyou, et al.
Published: (2025)
by: Zhang, Yuyou, et al.
Published: (2025)
Birds of a Different Feather Flock Together: Exploring Opportunities and Challenges in Animal-Human-Machine Teaming
by: Cohen, Myke C., et al.
Published: (2025)
by: Cohen, Myke C., et al.
Published: (2025)
Stateful Token Reduction for Long-Video Hybrid VLMs
by: Jiang, Jindong, et al.
Published: (2026)
by: Jiang, Jindong, et al.
Published: (2026)
MolmoPoint: Better Pointing for VLMs with Grounding Tokens
by: Clark, Christopher, et al.
Published: (2026)
by: Clark, Christopher, et al.
Published: (2026)
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
by: Ge, Yuyao, et al.
Published: (2025)
by: Ge, Yuyao, et al.
Published: (2025)
Towards Lossless Ultimate Vision Token Compression for VLMs
by: Zheng, Dehua, et al.
Published: (2025)
by: Zheng, Dehua, et al.
Published: (2025)
Treble Counterfactual VLMs: A Causal Approach to Hallucination
by: Li, Shawn, et al.
Published: (2025)
by: Li, Shawn, et al.
Published: (2025)
WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift
by: Santamaria, Julian D., et al.
Published: (2026)
by: Santamaria, Julian D., et al.
Published: (2026)
Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation
by: Zhu, Chunzheng, et al.
Published: (2026)
by: Zhu, Chunzheng, et al.
Published: (2026)
Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs
by: Izadi, Amirmohammad, et al.
Published: (2025)
by: Izadi, Amirmohammad, et al.
Published: (2025)
D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
by: Huang, Yiyang, et al.
Published: (2025)
by: Huang, Yiyang, et al.
Published: (2025)
3D Primitives are a Spatial Language for VLMs
by: Liu, Junze, et al.
Published: (2026)
by: Liu, Junze, et al.
Published: (2026)
MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
by: Khan, Ufaq, et al.
Published: (2026)
by: Khan, Ufaq, et al.
Published: (2026)
To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs
by: Hong, Rui, et al.
Published: (2026)
by: Hong, Rui, et al.
Published: (2026)
Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
by: Li, Chenjun
Published: (2026)
by: Li, Chenjun
Published: (2026)
Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)
by: Lian, Weitong, et al.
Published: (2026)
TAPS : Frustratingly Simple Test Time Active Learning for VLMs
by: Sarkar, Dhruv, et al.
Published: (2025)
by: Sarkar, Dhruv, et al.
Published: (2025)
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
by: Wang, Zhenhailong, et al.
Published: (2025)
by: Wang, Zhenhailong, et al.
Published: (2025)
See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
by: Park, Jaehyun, et al.
Published: (2026)
by: Park, Jaehyun, et al.
Published: (2026)
T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs
by: Xia, Shao-Jun, et al.
Published: (2025)
by: Xia, Shao-Jun, et al.
Published: (2025)
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs
by: Avogaro, Niccolo, et al.
Published: (2026)
by: Avogaro, Niccolo, et al.
Published: (2026)
Similar Items
-
Reliability Estimation of News Media Sources: Birds of a Feather Flock Together
by: Burdisso, Sergio, et al.
Published: (2024) -
Same Answer, Different Representations: Hidden instability in VLMs
by: Wani, Farooq Ahmad, et al.
Published: (2026) -
Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations
by: Mohammed, Fatma Youssef, et al.
Published: (2025) -
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
by: Kang, Inha, et al.
Published: (2025) -
Rethink MAE with Linear Time-Invariant Dynamics
by: Wang, Zice
Published: (2026)