Guardado en:
Detalles Bibliográficos
Autores principales: Erdem, Ege, Koyama, Shoichi, Nakamura, Tomohiko, Das, Orchisama, Cvetković, Zoran
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2605.10398
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866911671537106944
author Erdem, Ege
Koyama, Shoichi
Nakamura, Tomohiko
Das, Orchisama
Cvetković, Zoran
author_facet Erdem, Ege
Koyama, Shoichi
Nakamura, Tomohiko
Das, Orchisama
Cvetković, Zoran
contents Reconstructing a 3D sound field from sparse microphone measurements is a fundamental yet ill-posed problem, which we address through Acoustic Transfer Function (ATF) magnitude estimation. ATF magnitude encapsulates key perceptual and acoustic properties of a physical space with applications in room characterization and correction. Although recent generative paradigms such as Flow Matching (FM) have achieved state-of-the-art performance in speech and music generation, their potential in spatial audio remains underexplored. We propose a novel framework for 3D ATF magnitude reconstruction as a guided generation task, with a 3D U-Net conditioned by a permutation-invariant set encoder. This architecture enables reconstruction from an arbitrary number of sparse inputs while leveraging the stable and efficient training properties of FM. Experimental results demonstrate that SF-Flow achieves accurate reconstruction up to \SI{1}{kHz}, trains substantially faster than the autoencoder baseline, and improves significantly with dataset size.
format Preprint
id arxiv_https___arxiv_org_abs_2605_10398
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements
Erdem, Ege
Koyama, Shoichi
Nakamura, Tomohiko
Das, Orchisama
Cvetković, Zoran
Audio and Speech Processing
Reconstructing a 3D sound field from sparse microphone measurements is a fundamental yet ill-posed problem, which we address through Acoustic Transfer Function (ATF) magnitude estimation. ATF magnitude encapsulates key perceptual and acoustic properties of a physical space with applications in room characterization and correction. Although recent generative paradigms such as Flow Matching (FM) have achieved state-of-the-art performance in speech and music generation, their potential in spatial audio remains underexplored. We propose a novel framework for 3D ATF magnitude reconstruction as a guided generation task, with a 3D U-Net conditioned by a permutation-invariant set encoder. This architecture enables reconstruction from an arbitrary number of sparse inputs while leveraging the stable and efficient training properties of FM. Experimental results demonstrate that SF-Flow achieves accurate reconstruction up to \SI{1}{kHz}, trains substantially faster than the autoencoder baseline, and improves significantly with dataset size.
title SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements
topic Audio and Speech Processing
url https://arxiv.org/abs/2605.10398