Saved in:
| Main Authors: | Kang, Hyeonsu, Bao, Emily, Goswami, Anjan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.22045 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
by: Pani, Anupam, et al.
Published: (2025)
by: Pani, Anupam, et al.
Published: (2025)
Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
by: Li, Frank, et al.
Published: (2025)
by: Li, Frank, et al.
Published: (2025)
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)
by: Wang, Shengao, et al.
Published: (2025)
Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
by: Jang, Jonggyu, et al.
Published: (2024)
by: Jang, Jonggyu, et al.
Published: (2024)
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)
by: Berman, Shmuel, et al.
Published: (2025)
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
by: Kang, Inha, et al.
Published: (2025)
by: Kang, Inha, et al.
Published: (2025)
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)
by: Pan, Jiazhen, et al.
Published: (2025)
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
by: He, Zheqi, et al.
Published: (2025)
by: He, Zheqi, et al.
Published: (2025)
Evaluating Compositional Generalisation in VLMs and Diffusion Models
by: Pearson, Beth, et al.
Published: (2025)
by: Pearson, Beth, et al.
Published: (2025)
An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity
by: Lee, Yuxiao, et al.
Published: (2025)
by: Lee, Yuxiao, et al.
Published: (2025)
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
by: Kim, Minkyu, et al.
Published: (2026)
by: Kim, Minkyu, et al.
Published: (2026)
RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
by: Wang, Yi Ru, et al.
Published: (2025)
by: Wang, Yi Ru, et al.
Published: (2025)
Trust but Verify: Programmatic VLM Evaluation in the Wild
by: Prabhu, Viraj, et al.
Published: (2024)
by: Prabhu, Viraj, et al.
Published: (2024)
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
by: Saxena, Rohit, et al.
Published: (2026)
by: Saxena, Rohit, et al.
Published: (2026)
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
by: Faraz, Ali, et al.
Published: (2025)
by: Faraz, Ali, et al.
Published: (2025)
EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation
by: Han, Shuhao, et al.
Published: (2024)
by: Han, Shuhao, et al.
Published: (2024)
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
by: Jian, Ai, et al.
Published: (2025)
by: Jian, Ai, et al.
Published: (2025)
Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction
by: Bejerano, Emily, et al.
Published: (2026)
by: Bejerano, Emily, et al.
Published: (2026)
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
by: Jiang, Yifan, et al.
Published: (2025)
by: Jiang, Yifan, et al.
Published: (2025)
Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
by: Fan, Guodong, et al.
Published: (2026)
by: Fan, Guodong, et al.
Published: (2026)
DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
by: Meng, Zijie, et al.
Published: (2025)
by: Meng, Zijie, et al.
Published: (2025)
Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)
by: Lian, Weitong, et al.
Published: (2026)
GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
by: Kamath, Amita, et al.
Published: (2025)
by: Kamath, Amita, et al.
Published: (2025)
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
by: Peng, Tianhao, et al.
Published: (2025)
by: Peng, Tianhao, et al.
Published: (2025)
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness
by: Wan, Hanwen, et al.
Published: (2025)
by: Wan, Hanwen, et al.
Published: (2025)
Beyond the Pixels: VLM-based Evaluation of Identity Preservation in Reference-Guided Synthesis
by: Singhania, Aditi, et al.
Published: (2025)
by: Singhania, Aditi, et al.
Published: (2025)
Caption This, Reason That: VLMs Caught in the Middle
by: Weng, Zihan, et al.
Published: (2025)
by: Weng, Zihan, et al.
Published: (2025)
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs
by: Zaazou, Youssef, et al.
Published: (2026)
by: Zaazou, Youssef, et al.
Published: (2026)
ThermEval: A Structured Benchmark for Evaluation of Vision-Language Models on Thermal Imagery
by: Shrivastava, Ayush, et al.
Published: (2026)
by: Shrivastava, Ayush, et al.
Published: (2026)
edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer
by: Qian, Chen, et al.
Published: (2025)
by: Qian, Chen, et al.
Published: (2025)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
by: Kang, Donggoo, et al.
Published: (2024)
by: Kang, Donggoo, et al.
Published: (2024)
Unveiling Hidden Visual Information: A Reconstruction Attack Against Adversarial Visual Information Hiding
by: Jang, Jonggyu, et al.
Published: (2024)
by: Jang, Jonggyu, et al.
Published: (2024)
VACoT: Rethinking Visual Data Augmentation with VLMs
by: Xu, Zhengzhuo, et al.
Published: (2025)
by: Xu, Zhengzhuo, et al.
Published: (2025)
Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)
by: Gambashidze, Alexander, et al.
Published: (2025)
AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
by: Maniyar, Suyash, et al.
Published: (2025)
by: Maniyar, Suyash, et al.
Published: (2025)
VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images
by: Sarowar, Md Selim, et al.
Published: (2025)
by: Sarowar, Md Selim, et al.
Published: (2025)
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
by: Ge, Yuyao, et al.
Published: (2025)
by: Ge, Yuyao, et al.
Published: (2025)
Towards Lossless Ultimate Vision Token Compression for VLMs
by: Zheng, Dehua, et al.
Published: (2025)
by: Zheng, Dehua, et al.
Published: (2025)
Similar Items
-
Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
by: Pani, Anupam, et al.
Published: (2025) -
Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
by: Li, Frank, et al.
Published: (2025) -
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025) -
Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
by: Jang, Jonggyu, et al.
Published: (2024) -
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)