Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qiao, Yue, Kothapally, Vinay, Yu, Meng, Yu, Dong
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2409.06954
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917776289955840
author	Qiao, Yue Kothapally, Vinay Yu, Meng Yu, Dong
author_facet	Qiao, Yue Kothapally, Vinay Yu, Meng Yu, Dong
contents	Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at https://bridgoon97.github.io/NeuralAmbisonicEncoding/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_06954
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array Qiao, Yue Kothapally, Vinay Yu, Meng Yu, Dong Audio and Speech Processing Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at https://bridgoon97.github.io/NeuralAmbisonicEncoding/.
title	Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2409.06954

Similar Items