Guardado en:
| Autores principales: | , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.10398 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866911671537106944 |
|---|---|
| author | Erdem, Ege Koyama, Shoichi Nakamura, Tomohiko Das, Orchisama Cvetković, Zoran |
| author_facet | Erdem, Ege Koyama, Shoichi Nakamura, Tomohiko Das, Orchisama Cvetković, Zoran |
| contents | Reconstructing a 3D sound field from sparse microphone measurements is a fundamental yet ill-posed problem, which we address through Acoustic Transfer Function (ATF) magnitude estimation. ATF magnitude encapsulates key perceptual and acoustic properties of a physical space with applications in room characterization and correction. Although recent generative paradigms such as Flow Matching (FM) have achieved state-of-the-art performance in speech and music generation, their potential in spatial audio remains underexplored. We propose a novel framework for 3D ATF magnitude reconstruction as a guided generation task, with a 3D U-Net conditioned by a permutation-invariant set encoder. This architecture enables reconstruction from an arbitrary number of sparse inputs while leveraging the stable and efficient training properties of FM. Experimental results demonstrate that SF-Flow achieves accurate reconstruction up to \SI{1}{kHz}, trains substantially faster than the autoencoder baseline, and improves significantly with dataset size. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_10398 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements Erdem, Ege Koyama, Shoichi Nakamura, Tomohiko Das, Orchisama Cvetković, Zoran Audio and Speech Processing Reconstructing a 3D sound field from sparse microphone measurements is a fundamental yet ill-posed problem, which we address through Acoustic Transfer Function (ATF) magnitude estimation. ATF magnitude encapsulates key perceptual and acoustic properties of a physical space with applications in room characterization and correction. Although recent generative paradigms such as Flow Matching (FM) have achieved state-of-the-art performance in speech and music generation, their potential in spatial audio remains underexplored. We propose a novel framework for 3D ATF magnitude reconstruction as a guided generation task, with a 3D U-Net conditioned by a permutation-invariant set encoder. This architecture enables reconstruction from an arbitrary number of sparse inputs while leveraging the stable and efficient training properties of FM. Experimental results demonstrate that SF-Flow achieves accurate reconstruction up to \SI{1}{kHz}, trains substantially faster than the autoencoder baseline, and improves significantly with dataset size. |
| title | SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements |
| topic | Audio and Speech Processing |
| url | https://arxiv.org/abs/2605.10398 |