:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Anshul, Chaudhary, Rohan, Singh, Gagneet, Kumary, Abhay
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.17238
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024)

When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)

M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025)

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
by: Mukhopadhyay, Srija, et al.
Published: (2024)

Can World Models Benefit VLMs for World Dynamics?
by: Zhang, Kevin, et al.
Published: (2025)

AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs
by: Basappa, Aahana, et al.
Published: (2026)

Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification
by: Singh, Vikash, et al.
Published: (2026)

MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space
by: Singh, Anshul, et al.
Published: (2025)

Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs
by: Hong, Weihao, et al.
Published: (2026)

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)

iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs
by: Mayer, Julius, et al.
Published: (2025)

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
by: Zhou, Guanyu, et al.
Published: (2026)

Can VLMs Recall Factual Associations From Visual References?
by: Ashok, Dhananjay, et al.
Published: (2025)

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
by: Wang, Dianyi, et al.
Published: (2025)

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs
by: Saxena, Rohit, et al.
Published: (2025)

POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
by: Joshi, Abhinav, et al.
Published: (2025)

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs
by: Avogaro, Niccolo, et al.
Published: (2026)

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)

GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs
by: Nguyen, Duy, et al.
Published: (2025)

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
by: Khan, Ufaq, et al.
Published: (2026)

Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation
by: Deng, Ken, et al.
Published: (2026)

Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption
by: Erol, Mehmet Kaan
Published: (2026)

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)

Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
by: Oh, Youngtaek, et al.
Published: (2024)

STAR: A Benchmark for Situated Reasoning in Real-World Videos
by: Wu, Bo, et al.
Published: (2024)

When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification
by: Pang, Zirui, et al.
Published: (2025)

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection
by: Bhaskar, Paramananda, et al.
Published: (2026)

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
by: Liang, Yijun, et al.
Published: (2025)

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
by: Zhang, YiFan, et al.
Published: (2024)

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
by: Chen, Zijun, et al.
Published: (2024)

VLMs Can Aggregate Scattered Training Patches
by: Zhou, Zhanhui, et al.
Published: (2025)

Are VLMs Really Blind
by: Singh, Ayush, et al.
Published: (2024)

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
by: Pan, Zhiyu, et al.
Published: (2026)

How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
by: Yang, Luyu, et al.
Published: (2026)

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
by: Vaidya, Shreyas, et al.
Published: (2023)

A Multimodal, Multitask System for Generating E Commerce Text Listings from Images
by: Singh, Nayan Kumar
Published: (2025)

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)

Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
by: Singh, Ayush, et al.
Published: (2024)