:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kang, Hyeonsu, Bao, Emily, Goswami, Anjan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.22045
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding
by: Pani, Anupam, et al.
Published: (2025)

Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis
by: Li, Frank, et al.
Published: (2025)

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
by: Jang, Jonggyu, et al.
Published: (2024)

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)

What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
by: Kang, Inha, et al.
Published: (2025)

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
by: He, Zheqi, et al.
Published: (2025)

Evaluating Compositional Generalisation in VLMs and Diffusion Models
by: Pearson, Beth, et al.
Published: (2025)

An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity
by: Lee, Yuxiao, et al.
Published: (2025)

VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
by: Kim, Minkyu, et al.
Published: (2026)

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
by: Wang, Yi Ru, et al.
Published: (2025)

Trust but Verify: Programmatic VLM Evaluation in the Wild
by: Prabhu, Viraj, et al.
Published: (2024)

VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
by: Saxena, Rohit, et al.
Published: (2026)

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
by: Faraz, Ali, et al.
Published: (2025)

EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation
by: Han, Shuhao, et al.
Published: (2024)

CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
by: Jian, Ai, et al.
Published: (2025)

Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction
by: Bejerano, Emily, et al.
Published: (2026)

Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
by: Jiang, Yifan, et al.
Published: (2025)

Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
by: Fan, Guodong, et al.
Published: (2026)

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
by: Meng, Zijie, et al.
Published: (2025)

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
by: Kamath, Amita, et al.
Published: (2025)

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
by: Peng, Tianhao, et al.
Published: (2025)

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)

Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness
by: Wan, Hanwen, et al.
Published: (2025)

Beyond the Pixels: VLM-based Evaluation of Identity Preservation in Reference-Guided Synthesis
by: Singhania, Aditi, et al.
Published: (2025)

Caption This, Reason That: VLMs Caught in the Middle
by: Weng, Zihan, et al.
Published: (2025)

Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs
by: Zaazou, Youssef, et al.
Published: (2026)

ThermEval: A Structured Benchmark for Evaluation of Vision-Language Models on Thermal Imagery
by: Shrivastava, Ayush, et al.
Published: (2026)

edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer
by: Qian, Chen, et al.
Published: (2025)

DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)

VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
by: Kang, Donggoo, et al.
Published: (2024)

Unveiling Hidden Visual Information: A Reconstruction Attack Against Adversarial Visual Information Hiding
by: Jang, Jonggyu, et al.
Published: (2024)

VACoT: Rethinking Visual Data Augmentation with VLMs
by: Xu, Zhengzhuo, et al.
Published: (2025)

Listener-Rewarded Thinking in VLMs for Image Preferences
by: Gambashidze, Alexander, et al.
Published: (2025)

AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
by: Maniyar, Suyash, et al.
Published: (2025)

VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images
by: Sarowar, Md Selim, et al.
Published: (2025)

Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning
by: Ge, Yuyao, et al.
Published: (2025)

Towards Lossless Ultimate Vision Token Compression for VLMs
by: Zheng, Dehua, et al.
Published: (2025)