Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shibata, Yuto, Tanaka, Keitaro, Bando, Yoshiaki, Imoto, Keisuke, Kataoka, Hirokatsu, Aoki, Yoshimitsu
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.04428
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909568191168512
author	Shibata, Yuto Tanaka, Keitaro Bando, Yoshiaki Imoto, Keisuke Kataoka, Hirokatsu Aoki, Yoshimitsu
author_facet	Shibata, Yuto Tanaka, Keitaro Bando, Yoshiaki Imoto, Keisuke Kataoka, Hirokatsu Aoki, Yoshimitsu
contents	In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timings of sound events, is particularly challenged by the difficulty of acquiring a sufficient quantity of accurately labeled training data. Moreover, it is well known that manually annotated labels often contain noises and are significantly influenced by the subjective judgment of annotators. To address these challenges, we propose a novel pre-training method that utilizes a synthetic dataset, Formula-SED, where acoustic data are generated solely based on mathematical formulas. The proposed method enables large-scale pre-training by using the synthesis parameters applied at each time step as ground truth labels, thereby eliminating label noise and bias. We demonstrate that large-scale pre-training with Formula-SED significantly enhances model accuracy and accelerates training, as evidenced by our results in the DESED dataset used for DCASE2023 Challenge Task 4. The project page is at https://yutoshibata07.github.io/Formula-SED/
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_04428
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Formula-Supervised Sound Event Detection: Pre-Training Without Real Data Shibata, Yuto Tanaka, Keitaro Bando, Yoshiaki Imoto, Keisuke Kataoka, Hirokatsu Aoki, Yoshimitsu Sound Artificial Intelligence In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timings of sound events, is particularly challenged by the difficulty of acquiring a sufficient quantity of accurately labeled training data. Moreover, it is well known that manually annotated labels often contain noises and are significantly influenced by the subjective judgment of annotators. To address these challenges, we propose a novel pre-training method that utilizes a synthetic dataset, Formula-SED, where acoustic data are generated solely based on mathematical formulas. The proposed method enables large-scale pre-training by using the synthesis parameters applied at each time step as ground truth labels, thereby eliminating label noise and bias. We demonstrate that large-scale pre-training with Formula-SED significantly enhances model accuracy and accelerates training, as evidenced by our results in the DESED dataset used for DCASE2023 Challenge Task 4. The project page is at https://yutoshibata07.github.io/Formula-SED/
title	Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
topic	Sound Artificial Intelligence
url	https://arxiv.org/abs/2504.04428

Similar Items