MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Llave, Adrien, Granier, Emma, Pallone, Grégory
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Audio and Speech Processing Machine Learning
Accesso online:	https://arxiv.org/abs/2509.16715
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866908774748389376
author	Llave, Adrien Granier, Emma Pallone, Grégory
author_facet	Llave, Adrien Granier, Emma Pallone, Grégory
contents	In the development of spatial audio technologies, reliable and shared methods for evaluating audio quality are essential. Listening tests are currently the standard but remain costly in terms of time and resources. Several models predicting subjective scores have been proposed, but they do not generalize well to real-world signals. In this paper, we propose QASTAnet (Quality Assessment for SpaTial Audio network), a new metric based on a deep neural network, specialized on spatial audio (ambisonics and binaural). As training data is scarce, we aim for the model to be trainable with a small amount of data. To do so, we propose to rely on expert modeling of the low-level auditory system and use a neurnal network to model the high-level cognitive function of the quality judgement. We compare its performance to two reference metrics on a wide range of content types (speech, music, ambiance, anechoic, reverberated) and focusing on codec artifacts. Results demonstrate that QASTAnet overcomes the aforementioned limitations of the existing methods. The strong correlation between the proposed metric prediction and subjective scores makes it a good candidate for comparing codecs in their development.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_16715
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	QASTAnet: A DNN-based Quality Metric for Spatial Audio Llave, Adrien Granier, Emma Pallone, Grégory Audio and Speech Processing Machine Learning In the development of spatial audio technologies, reliable and shared methods for evaluating audio quality are essential. Listening tests are currently the standard but remain costly in terms of time and resources. Several models predicting subjective scores have been proposed, but they do not generalize well to real-world signals. In this paper, we propose QASTAnet (Quality Assessment for SpaTial Audio network), a new metric based on a deep neural network, specialized on spatial audio (ambisonics and binaural). As training data is scarce, we aim for the model to be trainable with a small amount of data. To do so, we propose to rely on expert modeling of the low-level auditory system and use a neurnal network to model the high-level cognitive function of the quality judgement. We compare its performance to two reference metrics on a wide range of content types (speech, music, ambiance, anechoic, reverberated) and focusing on codec artifacts. Results demonstrate that QASTAnet overcomes the aforementioned limitations of the existing methods. The strong correlation between the proposed metric prediction and subjective scores makes it a good candidate for comparing codecs in their development.
title	QASTAnet: A DNN-based Quality Metric for Spatial Audio
topic	Audio and Speech Processing Machine Learning
url	https://arxiv.org/abs/2509.16715

Documenti analoghi