:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Schneider, Stefanie
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.20853
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Have Large Vision-Language Models Mastered Art History?
by: Strafforello, Ombretta, et al.
Published: (2024)

EVLF-FM: Explainable Vision Language Foundation Model for Medicine
by: Bai, Yang, et al.
Published: (2025)

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
by: Cheng, Jiajun, et al.
Published: (2025)

Training A Small Emotional Vision Language Model for Visual Art Comprehension
by: Zhang, Jing, et al.
Published: (2024)

Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models
by: Movva, Prahitha
Published: (2025)

XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
by: Ito, Masato, et al.
Published: (2025)

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
by: Song, Nan, et al.
Published: (2025)

Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection
by: Yu, Peipeng, et al.
Published: (2025)

Explainability for Vision Foundation Models: A Survey
by: Kazmierczak, Rémi, et al.
Published: (2025)

MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models
by: Xu, Wenbo, et al.
Published: (2026)

Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models
by: Hashmi, Anees Ur Rehman, et al.
Published: (2024)

Deep Learning for Robust and Explainable Models in Computer Vision
by: Amirian, Mohammadreza
Published: (2024)

VLEER: Vision and Language Embeddings for Explainable Whole Slide Image Representation
by: Nguyen, Anh Tien, et al.
Published: (2025)

Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
by: Shu, Dong, et al.
Published: (2025)

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
by: Yuan, Zhengqing, et al.
Published: (2023)

GLIMPSE: Holistic Cross-Modal Explainability for Large Vision-Language Models
by: Shen, Guanxi
Published: (2025)

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
by: Zhu, William Yicheng, et al.
Published: (2024)

ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model
by: Waugh, Samuel, et al.
Published: (2026)

Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
by: Wang, Zhaochen, et al.
Published: (2025)

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
by: Taghadouini, Said, et al.
Published: (2026)

BiomedXPro: Prompt Optimization for Explainable Diagnosis with Biomedical Vision Language Models
by: Silva, Kaushitha, et al.
Published: (2025)

Exploring Superposition and Interference in State-of-the-Art Low-Parameter Vision Models
by: Hollard, Lilian, et al.
Published: (2025)

Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
by: Zhang, Xianren, et al.
Published: (2024)

DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models
by: Bousselham, Walid, et al.
Published: (2026)

X-Driver: Explainable Autonomous Driving with Vision-Language Models
by: Liu, Wei, et al.
Published: (2025)

HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
by: Koo, Myungkyu, et al.
Published: (2025)

ArtBrain: An Explainable end-to-end Toolkit for Classification and Attribution of AI-Generated Art and Style
by: Silva, Ravidu Suien Rammuni, et al.
Published: (2024)

Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation
by: Kim, Ju-Young, et al.
Published: (2025)

An Explainable Biomedical Foundation Model via Large-Scale Concept-Enhanced Vision-Language Pre-training
by: Nie, Yuxiang, et al.
Published: (2025)

Evaluating the Explainability of Vision Transformers in Medical Imaging
by: Barekatain, Leili, et al.
Published: (2025)

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
by: Geigle, Gregor, et al.
Published: (2025)

Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion
by: Zhang, Jiaru, et al.
Published: (2026)

History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
by: Ding, Xichen, et al.
Published: (2025)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
by: Deitke, Matt, et al.
Published: (2024)

Fine-tuning Vision Language Models with Graph-based Knowledge for Explainable Medical Image Analysis
by: Li, Chenjun, et al.
Published: (2025)

Decision-Aware Attention Propagation for Vision Transformer Explainability
by: Jo, Sehyeong, et al.
Published: (2026)

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
by: Ge, Xuri, et al.
Published: (2024)

Keypoint Counting Classifiers: Turning Vision Transformers into Self-Explainable Models Without Training
by: Wickstrøm, Kristoffer, et al.
Published: (2025)

Reproducibility study of "LICO: Explainable Models with Language-Image Consistency"
by: Fletcher, Luan, et al.
Published: (2024)

Understanding the Risks of Asphalt Art to the Reliability of Vision-Based Perception Systems
by: Ma, Jin, et al.
Published: (2025)