Saved in:
Bibliographic Details
Main Authors: Ryu, Myeonghoon, Oh, Hongseok, Lee, Suji, Park, Han
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.18322
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915295333974016
author Ryu, Myeonghoon
Oh, Hongseok
Lee, Suji
Park, Han
author_facet Ryu, Myeonghoon
Oh, Hongseok
Lee, Suji
Park, Han
contents We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.
format Preprint
id arxiv_https___arxiv_org_abs_2410_18322
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
Ryu, Myeonghoon
Oh, Hongseok
Lee, Suji
Park, Han
Sound
Machine Learning
Multimedia
Audio and Speech Processing
We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.
title Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
topic Sound
Machine Learning
Multimedia
Audio and Speech Processing
url https://arxiv.org/abs/2410.18322