Enregistré dans:
Détails bibliographiques
Auteurs principaux: Kawamura, Takao, Niizumi, Daisuke, Ono, Nobutaka
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2602.15307
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866908837431214080
author Kawamura, Takao
Niizumi, Daisuke
Ono, Nobutaka
author_facet Kawamura, Takao
Niizumi, Daisuke
Ono, Nobutaka
contents In this paper, we analyze the internal representations of a general-purpose audio self-supervised learning (SSL) model from a neuron-level perspective. Despite their strong empirical performance as feature extractors, the internal mechanisms underlying the robust generalization of SSL audio models remain unclear. Drawing on the framework of mechanistic interpretability, we identify and examine class-specific neurons by analyzing conditional activation patterns across diverse tasks. Our analysis reveals that SSL models foster the emergence of class-specific neurons that provide extensive coverage across novel task classes. These neurons exhibit shared responses across different semantic categories and acoustic similarities, such as speech attributes and musical pitch. We also confirm that these neurons have a functional impact on classification performance. To our knowledge, this is the first systematic neuron-level analysis of a general-purpose audio SSL model, providing new insights into its internal representation.
format Preprint
id arxiv_https___arxiv_org_abs_2602_15307
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model
Kawamura, Takao
Niizumi, Daisuke
Ono, Nobutaka
Audio and Speech Processing
Sound
In this paper, we analyze the internal representations of a general-purpose audio self-supervised learning (SSL) model from a neuron-level perspective. Despite their strong empirical performance as feature extractors, the internal mechanisms underlying the robust generalization of SSL audio models remain unclear. Drawing on the framework of mechanistic interpretability, we identify and examine class-specific neurons by analyzing conditional activation patterns across diverse tasks. Our analysis reveals that SSL models foster the emergence of class-specific neurons that provide extensive coverage across novel task classes. These neurons exhibit shared responses across different semantic categories and acoustic similarities, such as speech attributes and musical pitch. We also confirm that these neurons have a functional impact on classification performance. To our knowledge, this is the first systematic neuron-level analysis of a general-purpose audio SSL model, providing new insights into its internal representation.
title What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model
topic Audio and Speech Processing
Sound
url https://arxiv.org/abs/2602.15307