Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zou, Liang, Yan, Genwei, Wang, Ruoyu, Du, Jun, Lei, Meng, Gao, Tian, Fang, Xin
Format:	Preprint
Published:	2024
Subjects:	Sound Computer Vision and Pattern Recognition Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2403.11091
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929281968373760
author	Zou, Liang Yan, Genwei Wang, Ruoyu Du, Jun Lei, Meng Gao, Tian Fang, Xin
author_facet	Zou, Liang Yan, Genwei Wang, Ruoyu Du, Jun Lei, Meng Gao, Tian Fang, Xin
contents	This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_11091
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Multitask frame-level learning for few-shot sound event detection Zou, Liang Yan, Genwei Wang, Ruoyu Du, Jun Lei, Meng Gao, Tian Fang, Xin Sound Computer Vision and Pattern Recognition Audio and Speech Processing This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023.
title	Multitask frame-level learning for few-shot sound event detection
topic	Sound Computer Vision and Pattern Recognition Audio and Speech Processing
url	https://arxiv.org/abs/2403.11091

Similar Items