Saved in:
| Main Authors: | Singh, Anshul, Chaudhary, Rohan, Singh, Gagneet, Kumary, Abhay |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17238 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024)
by: Li, Shuo, et al.
Published: (2024)
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025)
by: Anugraha, David, et al.
Published: (2025)
Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
by: Mukhopadhyay, Srija, et al.
Published: (2024)
by: Mukhopadhyay, Srija, et al.
Published: (2024)
Can World Models Benefit VLMs for World Dynamics?
by: Zhang, Kevin, et al.
Published: (2025)
by: Zhang, Kevin, et al.
Published: (2025)
AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs
by: Basappa, Aahana, et al.
Published: (2026)
by: Basappa, Aahana, et al.
Published: (2026)
Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification
by: Singh, Vikash, et al.
Published: (2026)
by: Singh, Vikash, et al.
Published: (2026)
MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space
by: Singh, Anshul, et al.
Published: (2025)
by: Singh, Anshul, et al.
Published: (2025)
Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs
by: Hong, Weihao, et al.
Published: (2026)
by: Hong, Weihao, et al.
Published: (2026)
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs
by: Mayer, Julius, et al.
Published: (2025)
by: Mayer, Julius, et al.
Published: (2025)
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)
by: Lu, Meng, et al.
Published: (2025)
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
by: Zhou, Guanyu, et al.
Published: (2026)
by: Zhou, Guanyu, et al.
Published: (2026)
Can VLMs Recall Factual Associations From Visual References?
by: Ashok, Dhananjay, et al.
Published: (2025)
by: Ashok, Dhananjay, et al.
Published: (2025)
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
by: Wang, Dianyi, et al.
Published: (2025)
by: Wang, Dianyi, et al.
Published: (2025)
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs
by: Saxena, Rohit, et al.
Published: (2025)
by: Saxena, Rohit, et al.
Published: (2025)
POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
by: Joshi, Abhinav, et al.
Published: (2025)
by: Joshi, Abhinav, et al.
Published: (2025)
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs
by: Avogaro, Niccolo, et al.
Published: (2026)
by: Avogaro, Niccolo, et al.
Published: (2026)
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)
by: Wang, Shengao, et al.
Published: (2025)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs
by: Nguyen, Duy, et al.
Published: (2025)
by: Nguyen, Duy, et al.
Published: (2025)
MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
by: Khan, Ufaq, et al.
Published: (2026)
by: Khan, Ufaq, et al.
Published: (2026)
Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation
by: Deng, Ken, et al.
Published: (2026)
by: Deng, Ken, et al.
Published: (2026)
Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption
by: Erol, Mehmet Kaan
Published: (2026)
by: Erol, Mehmet Kaan
Published: (2026)
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)
by: Zhang, Yue, et al.
Published: (2026)
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
by: Oh, Youngtaek, et al.
Published: (2024)
by: Oh, Youngtaek, et al.
Published: (2024)
STAR: A Benchmark for Situated Reasoning in Real-World Videos
by: Wu, Bo, et al.
Published: (2024)
by: Wu, Bo, et al.
Published: (2024)
When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification
by: Pang, Zirui, et al.
Published: (2025)
by: Pang, Zirui, et al.
Published: (2025)
FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection
by: Bhaskar, Paramananda, et al.
Published: (2026)
by: Bhaskar, Paramananda, et al.
Published: (2026)
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
by: Liang, Yijun, et al.
Published: (2025)
by: Liang, Yijun, et al.
Published: (2025)
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
by: Zhang, YiFan, et al.
Published: (2024)
by: Zhang, YiFan, et al.
Published: (2024)
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
by: Chen, Zijun, et al.
Published: (2024)
by: Chen, Zijun, et al.
Published: (2024)
VLMs Can Aggregate Scattered Training Patches
by: Zhou, Zhanhui, et al.
Published: (2025)
by: Zhou, Zhanhui, et al.
Published: (2025)
Are VLMs Really Blind
by: Singh, Ayush, et al.
Published: (2024)
by: Singh, Ayush, et al.
Published: (2024)
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
by: Pan, Zhiyu, et al.
Published: (2026)
by: Pan, Zhiyu, et al.
Published: (2026)
How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
by: Yang, Luyu, et al.
Published: (2026)
by: Yang, Luyu, et al.
Published: (2026)
Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
by: Vaidya, Shreyas, et al.
Published: (2023)
by: Vaidya, Shreyas, et al.
Published: (2023)
A Multimodal, Multitask System for Generating E Commerce Text Listings from Images
by: Singh, Nayan Kumar
Published: (2025)
by: Singh, Nayan Kumar
Published: (2025)
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)
by: Karamcheti, Siddharth, et al.
Published: (2024)
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
by: Singh, Ayush, et al.
Published: (2024)
by: Singh, Ayush, et al.
Published: (2024)
Similar Items
-
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024) -
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025) -
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025) -
Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
by: Mukhopadhyay, Srija, et al.
Published: (2024) -
Can World Models Benefit VLMs for World Dynamics?
by: Zhang, Kevin, et al.
Published: (2025)