Saved in:
| Main Authors: | He, Haoyu, Zhuo, Yue, Zheng, Yu, Wang, Qi R. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.27070 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
by: Wang, Haoyu, et al.
Published: (2026)
by: Wang, Haoyu, et al.
Published: (2026)
Probing and Inducing Combinational Creativity in Vision-Language Models
by: Peng, Yongqian, et al.
Published: (2025)
by: Peng, Yongqian, et al.
Published: (2025)
Probing Perceptual Constancy in Large Vision-Language Models
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025)
by: Yu, Haiyang, et al.
Published: (2025)
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities
by: Xia, Shiyu, et al.
Published: (2024)
by: Xia, Shiyu, et al.
Published: (2024)
TraversalBench: Challenging Paths to Follow for Vision Language Models
by: Petrova, Clara, et al.
Published: (2026)
by: Petrova, Clara, et al.
Published: (2026)
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales
by: Qian, Zhaofang, et al.
Published: (2025)
by: Qian, Zhaofang, et al.
Published: (2025)
From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models
by: Han, Tiancheng, et al.
Published: (2025)
by: Han, Tiancheng, et al.
Published: (2025)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
by: Li, Rongjie, et al.
Published: (2024)
by: Li, Rongjie, et al.
Published: (2024)
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
by: Min, Cheolhong, et al.
Published: (2026)
by: Min, Cheolhong, et al.
Published: (2026)
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
by: He, Jiawei, et al.
Published: (2025)
by: He, Jiawei, et al.
Published: (2025)
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
by: Xia, Peng, et al.
Published: (2023)
by: Xia, Peng, et al.
Published: (2023)
Joint Vision-Language Social Bias Removal for CLIP
by: Zhang, Haoyu, et al.
Published: (2024)
by: Zhang, Haoyu, et al.
Published: (2024)
GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
by: Yin, Hang, et al.
Published: (2025)
by: Yin, Hang, et al.
Published: (2025)
RULER-Bench: Probing Rule-based Reasoning Abilities of Next-level Video Generation Models for Vision Foundation Intelligence
by: He, Xuming, et al.
Published: (2025)
by: He, Xuming, et al.
Published: (2025)
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
by: Guan, Jiwei, et al.
Published: (2024)
by: Guan, Jiwei, et al.
Published: (2024)
Vision Large Language Models Are Good Noise Handlers in Engagement Analysis
by: Vedernikov, Alexander, et al.
Published: (2025)
by: Vedernikov, Alexander, et al.
Published: (2025)
Conceptual Codebook Learning for Vision-Language Models
by: Zhang, Yi, et al.
Published: (2024)
by: Zhang, Yi, et al.
Published: (2024)
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
by: Wang, Yu, et al.
Published: (2024)
by: Wang, Yu, et al.
Published: (2024)
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
by: Yang, Fan, et al.
Published: (2024)
by: Yang, Fan, et al.
Published: (2024)
Language-Guided Invariance Probing of Vision-Language Models
by: Lee, Jae Joong
Published: (2025)
by: Lee, Jae Joong
Published: (2025)
Co-Training Vision Language Models for Remote Sensing Multi-task Learning
by: Li, Qingyun, et al.
Published: (2025)
by: Li, Qingyun, et al.
Published: (2025)
VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck
by: Zhang, Feiran, et al.
Published: (2026)
by: Zhang, Feiran, et al.
Published: (2026)
Integrated Structural Prompt Learning for Vision-Language Models
by: Wang, Jiahui, et al.
Published: (2025)
by: Wang, Jiahui, et al.
Published: (2025)
Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning
by: Huang, Zhuo, et al.
Published: (2023)
by: Huang, Zhuo, et al.
Published: (2023)
Rethinking Noise-Robust Training for Frozen Vision Foundation Models: A Cross-Dataset Benchmark with a Case Study of Small-Loss Failure
by: Li, Zitong, et al.
Published: (2026)
by: Li, Zitong, et al.
Published: (2026)
ESceme: Vision-and-Language Navigation with Episodic Scene Memory
by: Zheng, Qi, et al.
Published: (2023)
by: Zheng, Qi, et al.
Published: (2023)
Hierarchical Cross-modal Prompt Learning for Vision-Language Models
by: Zheng, Hao, et al.
Published: (2025)
by: Zheng, Hao, et al.
Published: (2025)
Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model
by: Wang, Chenfeng, et al.
Published: (2026)
by: Wang, Chenfeng, et al.
Published: (2026)
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
by: Zhao, Zongchuang, et al.
Published: (2025)
by: Zhao, Zongchuang, et al.
Published: (2025)
Safety Alignment for Vision Language Models
by: Liu, Zhendong, et al.
Published: (2024)
by: Liu, Zhendong, et al.
Published: (2024)
Enhancing Vision-Language Model with Unmasked Token Alignment
by: Liu, Jihao, et al.
Published: (2024)
by: Liu, Jihao, et al.
Published: (2024)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
by: Peng, Wenshuo, et al.
Published: (2024)
by: Peng, Wenshuo, et al.
Published: (2024)
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
by: Xuan, Weihao, et al.
Published: (2025)
by: Xuan, Weihao, et al.
Published: (2025)
Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models
by: Zheng, Yue, et al.
Published: (2025)
by: Zheng, Yue, et al.
Published: (2025)
Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
by: Lu, Fan, et al.
Published: (2024)
by: Lu, Fan, et al.
Published: (2024)
Reasoning or Pattern Matching? Probing Large Vision-Language Models with Visual Puzzles
by: Lymperaiou, Maria, et al.
Published: (2026)
by: Lymperaiou, Maria, et al.
Published: (2026)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)
by: Jia, Mengdi, et al.
Published: (2025)
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
by: Jiang, Haoyu, et al.
Published: (2024)
by: Jiang, Haoyu, et al.
Published: (2024)
Similar Items
-
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
by: Wang, Haoyu, et al.
Published: (2026) -
Probing and Inducing Combinational Creativity in Vision-Language Models
by: Peng, Yongqian, et al.
Published: (2025) -
Probing Perceptual Constancy in Large Vision-Language Models
by: Sun, Haoran, et al.
Published: (2025) -
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025) -
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities
by: Xia, Shiyu, et al.
Published: (2024)