Saved in:
Bibliographic Details
Main Authors: Shibata, Yuto, Tanaka, Keitaro, Bando, Yoshiaki, Imoto, Keisuke, Kataoka, Hirokatsu, Aoki, Yoshimitsu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.04428
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909568191168512
author Shibata, Yuto
Tanaka, Keitaro
Bando, Yoshiaki
Imoto, Keisuke
Kataoka, Hirokatsu
Aoki, Yoshimitsu
author_facet Shibata, Yuto
Tanaka, Keitaro
Bando, Yoshiaki
Imoto, Keisuke
Kataoka, Hirokatsu
Aoki, Yoshimitsu
contents In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timings of sound events, is particularly challenged by the difficulty of acquiring a sufficient quantity of accurately labeled training data. Moreover, it is well known that manually annotated labels often contain noises and are significantly influenced by the subjective judgment of annotators. To address these challenges, we propose a novel pre-training method that utilizes a synthetic dataset, Formula-SED, where acoustic data are generated solely based on mathematical formulas. The proposed method enables large-scale pre-training by using the synthesis parameters applied at each time step as ground truth labels, thereby eliminating label noise and bias. We demonstrate that large-scale pre-training with Formula-SED significantly enhances model accuracy and accelerates training, as evidenced by our results in the DESED dataset used for DCASE2023 Challenge Task 4. The project page is at https://yutoshibata07.github.io/Formula-SED/
format Preprint
id arxiv_https___arxiv_org_abs_2504_04428
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
Shibata, Yuto
Tanaka, Keitaro
Bando, Yoshiaki
Imoto, Keisuke
Kataoka, Hirokatsu
Aoki, Yoshimitsu
Sound
Artificial Intelligence
In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timings of sound events, is particularly challenged by the difficulty of acquiring a sufficient quantity of accurately labeled training data. Moreover, it is well known that manually annotated labels often contain noises and are significantly influenced by the subjective judgment of annotators. To address these challenges, we propose a novel pre-training method that utilizes a synthetic dataset, Formula-SED, where acoustic data are generated solely based on mathematical formulas. The proposed method enables large-scale pre-training by using the synthesis parameters applied at each time step as ground truth labels, thereby eliminating label noise and bias. We demonstrate that large-scale pre-training with Formula-SED significantly enhances model accuracy and accelerates training, as evidenced by our results in the DESED dataset used for DCASE2023 Challenge Task 4. The project page is at https://yutoshibata07.github.io/Formula-SED/
title Formula-Supervised Sound Event Detection: Pre-Training Without Real Data
topic Sound
Artificial Intelligence
url https://arxiv.org/abs/2504.04428