Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ryu, Myeonghoon, Oh, Hongseok, Lee, Suji, Park, Han
Format:	Preprint
Published:	2024
Subjects:	Sound Machine Learning Multimedia Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2410.18322
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915295333974016
author	Ryu, Myeonghoon Oh, Hongseok Lee, Suji Park, Han
author_facet	Ryu, Myeonghoon Oh, Hongseok Lee, Suji Park, Han
contents	We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_18322
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation Ryu, Myeonghoon Oh, Hongseok Lee, Suji Park, Han Sound Machine Learning Multimedia Audio and Speech Processing We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.
title	Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
topic	Sound Machine Learning Multimedia Audio and Speech Processing
url	https://arxiv.org/abs/2410.18322

Similar Items