Saved in:
Bibliographic Details
Main Authors: Qiao, Yue, Kothapally, Vinay, Yu, Meng, Yu, Dong
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.06954
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917776289955840
author Qiao, Yue
Kothapally, Vinay
Yu, Meng
Yu, Dong
author_facet Qiao, Yue
Kothapally, Vinay
Yu, Meng
Yu, Dong
contents Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at https://bridgoon97.github.io/NeuralAmbisonicEncoding/.
format Preprint
id arxiv_https___arxiv_org_abs_2409_06954
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
Qiao, Yue
Kothapally, Vinay
Yu, Meng
Yu, Dong
Audio and Speech Processing
Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently outperforms traditional signal processing (SP) and DL-based methods, providing significantly better timbral and spatial quality and higher source localization accuracy. Binaural audio demos with visualizations are available at https://bridgoon97.github.io/NeuralAmbisonicEncoding/.
title Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
topic Audio and Speech Processing
url https://arxiv.org/abs/2409.06954