Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Doumanoglou, Alexandros, Driessens, Kurt, Zarpalas, Dimitrios
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.23926
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908874733256704
author	Doumanoglou, Alexandros Driessens, Kurt Zarpalas, Dimitrios
author_facet	Doumanoglou, Alexandros Driessens, Kurt Zarpalas, Dimitrios
contents	Empirical evidence shows that deep vision networks often represent concepts as directions in latent space with concept information written along directional components in the vector representation of the input. However, the mechanism to encode (write) and decode (read) concept information to and from vector representations is not directly accessible as it constitutes a latent mechanism that naturally emerges from the training process of the network. Recovering this mechanism unlocks significant potential to open the black-box nature of deep networks, enabling understanding, debugging, and improving deep learning models. In this work, we propose an unsupervised method to recover this mechanism. For each concept, we explain that under the hypothesis of linear concept representations, this mechanism can be implemented with the help of two directions: the first facilitating encoding of concept information and the second facilitating decoding. Unlike prior matrix decomposition, autoencoder, or dictionary learning methods that rely on feature reconstruction, we propose a new perspective: decoding directions are identified via directional clustering of activations, and encoding directions are estimated with signal vectors under a probabilistic view. We further leverage network weights through a novel technique, Uncertainty Region Alignment, which reveals interpretable directions affecting predictions. Our analysis shows that (a) on synthetic data, our method recovers ground-truth direction pairs; (b) on real data, decoding directions map to monosemantic, interpretable concepts and outperform unsupervised baselines; and (c) signal vectors faithfully estimate encoding directions, validated via activation maximization. Finally, we demonstrate applications in understanding global model behavior, explaining individual predictions, and intervening to produce counterfactuals or correct errors.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_23926
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks Doumanoglou, Alexandros Driessens, Kurt Zarpalas, Dimitrios Computer Vision and Pattern Recognition Empirical evidence shows that deep vision networks often represent concepts as directions in latent space with concept information written along directional components in the vector representation of the input. However, the mechanism to encode (write) and decode (read) concept information to and from vector representations is not directly accessible as it constitutes a latent mechanism that naturally emerges from the training process of the network. Recovering this mechanism unlocks significant potential to open the black-box nature of deep networks, enabling understanding, debugging, and improving deep learning models. In this work, we propose an unsupervised method to recover this mechanism. For each concept, we explain that under the hypothesis of linear concept representations, this mechanism can be implemented with the help of two directions: the first facilitating encoding of concept information and the second facilitating decoding. Unlike prior matrix decomposition, autoencoder, or dictionary learning methods that rely on feature reconstruction, we propose a new perspective: decoding directions are identified via directional clustering of activations, and encoding directions are estimated with signal vectors under a probabilistic view. We further leverage network weights through a novel technique, Uncertainty Region Alignment, which reveals interpretable directions affecting predictions. Our analysis shows that (a) on synthetic data, our method recovers ground-truth direction pairs; (b) on real data, decoding directions map to monosemantic, interpretable concepts and outperform unsupervised baselines; and (c) signal vectors faithfully estimate encoding directions, validated via activation maximization. Finally, we demonstrate applications in understanding global model behavior, explaining individual predictions, and intervening to produce counterfactuals or correct errors.
title	Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2509.23926

Similar Items