Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tuncay, Ludovic, Labbé, Etienne, Pellegrini, Thomas
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2503.21826
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912298639032320
author	Tuncay, Ludovic Labbé, Etienne Pellegrini, Thomas
author_facet	Tuncay, Ludovic Labbé, Etienne Pellegrini, Thomas
contents	AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we apply Hierarchical Label Propagation (HLP), which propagates labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and affecting 109 out of the 527 classes. Our results demonstrate that HLP provides performance benefits across various model architectures, including convolutional neural networks (PANN's CNN6 and ConvNeXT) and transformers (PaSST), with smaller models showing more improvements. Finally, on FSD50K, another widely used dataset, models trained on AudioSet with HLP consistently outperformed those trained without HLP. Our source code will be made available on GitHub.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_21826
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging Tuncay, Ludovic Labbé, Etienne Pellegrini, Thomas Sound Machine Learning Audio and Speech Processing AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we apply Hierarchical Label Propagation (HLP), which propagates labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and affecting 109 out of the 527 classes. Our results demonstrate that HLP provides performance benefits across various model architectures, including convolutional neural networks (PANN's CNN6 and ConvNeXT) and transformers (PaSST), with smaller models showing more improvements. Finally, on FSD50K, another widely used dataset, models trained on AudioSet with HLP consistently outperformed those trained without HLP. Our source code will be made available on GitHub.
title	Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging
topic	Sound Machine Learning Audio and Speech Processing
url	https://arxiv.org/abs/2503.21826

Similar Items