Enregistré dans:
| Auteurs principaux: | , , |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2602.15307 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866908837431214080 |
|---|---|
| author | Kawamura, Takao Niizumi, Daisuke Ono, Nobutaka |
| author_facet | Kawamura, Takao Niizumi, Daisuke Ono, Nobutaka |
| contents | In this paper, we analyze the internal representations of a general-purpose audio self-supervised learning (SSL) model from a neuron-level perspective. Despite their strong empirical performance as feature extractors, the internal mechanisms underlying the robust generalization of SSL audio models remain unclear. Drawing on the framework of mechanistic interpretability, we identify and examine class-specific neurons by analyzing conditional activation patterns across diverse tasks. Our analysis reveals that SSL models foster the emergence of class-specific neurons that provide extensive coverage across novel task classes. These neurons exhibit shared responses across different semantic categories and acoustic similarities, such as speech attributes and musical pitch. We also confirm that these neurons have a functional impact on classification performance. To our knowledge, this is the first systematic neuron-level analysis of a general-purpose audio SSL model, providing new insights into its internal representation. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_15307 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model Kawamura, Takao Niizumi, Daisuke Ono, Nobutaka Audio and Speech Processing Sound In this paper, we analyze the internal representations of a general-purpose audio self-supervised learning (SSL) model from a neuron-level perspective. Despite their strong empirical performance as feature extractors, the internal mechanisms underlying the robust generalization of SSL audio models remain unclear. Drawing on the framework of mechanistic interpretability, we identify and examine class-specific neurons by analyzing conditional activation patterns across diverse tasks. Our analysis reveals that SSL models foster the emergence of class-specific neurons that provide extensive coverage across novel task classes. These neurons exhibit shared responses across different semantic categories and acoustic similarities, such as speech attributes and musical pitch. We also confirm that these neurons have a functional impact on classification performance. To our knowledge, this is the first systematic neuron-level analysis of a general-purpose audio SSL model, providing new insights into its internal representation. |
| title | What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model |
| topic | Audio and Speech Processing Sound |
| url | https://arxiv.org/abs/2602.15307 |