Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Mishra, Mayank, Magron, Paul, Serizel, Romain
Format:	Preprint
Publié:	2025
Sujets:	Sound
Accès en ligne:	https://arxiv.org/abs/2511.07075
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866913162112008192
author	Mishra, Mayank Magron, Paul Serizel, Romain
author_facet	Mishra, Mayank Magron, Paul Serizel, Romain
contents	Spatial semantic segmentation of sound scenes (S5) consists of jointly performing audio source separation and sound event classification from a multichannel audio mixture. Evaluating S5 systems with separation and classification metrics individually makes system comparison difficult, whereas existing joint metrics, such as the class-aware signal-to-distortion ratio (CA-SDR), can conflate separation and labeling errors. In particular, CA-SDR relies on predicted class labels for source matching, which may obscure label swaps or misclassifications when the underlying source estimates remain perceptually correct. In this work, we introduce the class and source-aware signal-to-distortion ratio (CASA-SDR), a new metric that performs permutation-invariant source matching before computing classification errors, thereby shifting from a classification-focused approach to a separation-focused approach. We first analyze CA-SDR in controlled scenarios with oracle separation and synthetic classification errors, as well as under controlled cross-contamination between sources, and compare its behavior to that of the classical SDR and CASA-SDR. We also study the impact of classification errors on the metrics by introducing error-based and source-based aggregation strategies. Finally, we compare CA-SDR and CASA-SDR on systems submitted to Task 4 of the DCASE 2025 challenge, highlighting the cases where CA-SDR over-penalizes label swaps or poorly separated sources, while CASA-SDR provides a more interpretable separation-centric assessment of S5 performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_07075
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Metric Analysis for Spatial Semantic Segmentation of Sound Scenes Mishra, Mayank Magron, Paul Serizel, Romain Sound Spatial semantic segmentation of sound scenes (S5) consists of jointly performing audio source separation and sound event classification from a multichannel audio mixture. Evaluating S5 systems with separation and classification metrics individually makes system comparison difficult, whereas existing joint metrics, such as the class-aware signal-to-distortion ratio (CA-SDR), can conflate separation and labeling errors. In particular, CA-SDR relies on predicted class labels for source matching, which may obscure label swaps or misclassifications when the underlying source estimates remain perceptually correct. In this work, we introduce the class and source-aware signal-to-distortion ratio (CASA-SDR), a new metric that performs permutation-invariant source matching before computing classification errors, thereby shifting from a classification-focused approach to a separation-focused approach. We first analyze CA-SDR in controlled scenarios with oracle separation and synthetic classification errors, as well as under controlled cross-contamination between sources, and compare its behavior to that of the classical SDR and CASA-SDR. We also study the impact of classification errors on the metrics by introducing error-based and source-based aggregation strategies. Finally, we compare CA-SDR and CASA-SDR on systems submitted to Task 4 of the DCASE 2025 challenge, highlighting the cases where CA-SDR over-penalizes label swaps or poorly separated sources, while CASA-SDR provides a more interpretable separation-centric assessment of S5 performance.
title	Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
topic	Sound
url	https://arxiv.org/abs/2511.07075

Documents similaires