Saved in:
| Main Author: | Schneider, Stefanie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20853 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Have Large Vision-Language Models Mastered Art History?
by: Strafforello, Ombretta, et al.
Published: (2024)
by: Strafforello, Ombretta, et al.
Published: (2024)
EVLF-FM: Explainable Vision Language Foundation Model for Medicine
by: Bai, Yang, et al.
Published: (2025)
by: Bai, Yang, et al.
Published: (2025)
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
by: Cheng, Jiajun, et al.
Published: (2025)
by: Cheng, Jiajun, et al.
Published: (2025)
Training A Small Emotional Vision Language Model for Visual Art Comprehension
by: Zhang, Jing, et al.
Published: (2024)
by: Zhang, Jing, et al.
Published: (2024)
Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models
by: Movva, Prahitha
Published: (2025)
by: Movva, Prahitha
Published: (2025)
XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
by: Ito, Masato, et al.
Published: (2025)
by: Ito, Masato, et al.
Published: (2025)
LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
by: Song, Nan, et al.
Published: (2025)
by: Song, Nan, et al.
Published: (2025)
Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection
by: Yu, Peipeng, et al.
Published: (2025)
by: Yu, Peipeng, et al.
Published: (2025)
Explainability for Vision Foundation Models: A Survey
by: Kazmierczak, Rémi, et al.
Published: (2025)
by: Kazmierczak, Rémi, et al.
Published: (2025)
MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models
by: Xu, Wenbo, et al.
Published: (2026)
by: Xu, Wenbo, et al.
Published: (2026)
Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models
by: Hashmi, Anees Ur Rehman, et al.
Published: (2024)
by: Hashmi, Anees Ur Rehman, et al.
Published: (2024)
Deep Learning for Robust and Explainable Models in Computer Vision
by: Amirian, Mohammadreza
Published: (2024)
by: Amirian, Mohammadreza
Published: (2024)
VLEER: Vision and Language Embeddings for Explainable Whole Slide Image Representation
by: Nguyen, Anh Tien, et al.
Published: (2025)
by: Nguyen, Anh Tien, et al.
Published: (2025)
Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
by: Yuan, Zhengqing, et al.
Published: (2023)
by: Yuan, Zhengqing, et al.
Published: (2023)
GLIMPSE: Holistic Cross-Modal Explainability for Large Vision-Language Models
by: Shen, Guanxi
Published: (2025)
by: Shen, Guanxi
Published: (2025)
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
by: Zhu, William Yicheng, et al.
Published: (2024)
by: Zhu, William Yicheng, et al.
Published: (2024)
ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model
by: Waugh, Samuel, et al.
Published: (2026)
by: Waugh, Samuel, et al.
Published: (2026)
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
by: Wang, Zhaochen, et al.
Published: (2025)
by: Wang, Zhaochen, et al.
Published: (2025)
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
by: Taghadouini, Said, et al.
Published: (2026)
by: Taghadouini, Said, et al.
Published: (2026)
BiomedXPro: Prompt Optimization for Explainable Diagnosis with Biomedical Vision Language Models
by: Silva, Kaushitha, et al.
Published: (2025)
by: Silva, Kaushitha, et al.
Published: (2025)
Exploring Superposition and Interference in State-of-the-Art Low-Parameter Vision Models
by: Hollard, Lilian, et al.
Published: (2025)
by: Hollard, Lilian, et al.
Published: (2025)
Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
by: Zhang, Xianren, et al.
Published: (2024)
by: Zhang, Xianren, et al.
Published: (2024)
DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models
by: Bousselham, Walid, et al.
Published: (2026)
by: Bousselham, Walid, et al.
Published: (2026)
X-Driver: Explainable Autonomous Driving with Vision-Language Models
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
by: Koo, Myungkyu, et al.
Published: (2025)
by: Koo, Myungkyu, et al.
Published: (2025)
ArtBrain: An Explainable end-to-end Toolkit for Classification and Attribution of AI-Generated Art and Style
by: Silva, Ravidu Suien Rammuni, et al.
Published: (2024)
by: Silva, Ravidu Suien Rammuni, et al.
Published: (2024)
Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation
by: Kim, Ju-Young, et al.
Published: (2025)
by: Kim, Ju-Young, et al.
Published: (2025)
An Explainable Biomedical Foundation Model via Large-Scale Concept-Enhanced Vision-Language Pre-training
by: Nie, Yuxiang, et al.
Published: (2025)
by: Nie, Yuxiang, et al.
Published: (2025)
Evaluating the Explainability of Vision Transformers in Medical Imaging
by: Barekatain, Leili, et al.
Published: (2025)
by: Barekatain, Leili, et al.
Published: (2025)
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
by: Geigle, Gregor, et al.
Published: (2025)
by: Geigle, Gregor, et al.
Published: (2025)
Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion
by: Zhang, Jiaru, et al.
Published: (2026)
by: Zhang, Jiaru, et al.
Published: (2026)
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
by: Ding, Xichen, et al.
Published: (2025)
by: Ding, Xichen, et al.
Published: (2025)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
by: Deitke, Matt, et al.
Published: (2024)
by: Deitke, Matt, et al.
Published: (2024)
Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis
by: Li, Chenjun, et al.
Published: (2025)
by: Li, Chenjun, et al.
Published: (2025)
Decision-Aware Attention Propagation for Vision Transformer Explainability
by: Jo, Sehyeong, et al.
Published: (2026)
by: Jo, Sehyeong, et al.
Published: (2026)
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
by: Ge, Xuri, et al.
Published: (2024)
by: Ge, Xuri, et al.
Published: (2024)
Keypoint Counting Classifiers: Turning Vision Transformers into Self-Explainable Models Without Training
by: Wickstrøm, Kristoffer, et al.
Published: (2025)
by: Wickstrøm, Kristoffer, et al.
Published: (2025)
Reproducibility study of "LICO: Explainable Models with Language-Image Consistency"
by: Fletcher, Luan, et al.
Published: (2024)
by: Fletcher, Luan, et al.
Published: (2024)
Understanding the Risks of Asphalt Art to the Reliability of Vision-Based Perception Systems
by: Ma, Jin, et al.
Published: (2025)
by: Ma, Jin, et al.
Published: (2025)
Similar Items
-
Have Large Vision-Language Models Mastered Art History?
by: Strafforello, Ombretta, et al.
Published: (2024) -
EVLF-FM: Explainable Vision Language Foundation Model for Medicine
by: Bai, Yang, et al.
Published: (2025) -
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
by: Cheng, Jiajun, et al.
Published: (2025) -
Training A Small Emotional Vision Language Model for Visual Art Comprehension
by: Zhang, Jing, et al.
Published: (2024) -
Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models
by: Movva, Prahitha
Published: (2025)