Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08967 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915929769639936 |
|---|---|
| author | Bi, Chunhao Zhong, Houqiang Xu, Zhixin Song, Li Cheng, Zhengxue |
| author_facet | Bi, Chunhao Zhong, Houqiang Xu, Zhixin Song, Li Cheng, Zhengxue |
| contents | Spatial audio is fundamental to immersive virtual experiences, yet synthesizing high-fidelity binaural audio from sparse observations remains a significant challenge. Existing methods typically rely on implicit neural representations conditioned on visual priors, which often struggle to capture fine-grained acoustic structures. Inspired by 3D Gaussian Splatting (3DGS), we introduce AudioGS, a novel visual-free framework that explicitly encodes the sound field as a set of Audio Gaussians based on spectrograms. AudioGS associates each time-frequency bin with an Audio Gaussian equipped with dual Spherical Harmonic (SH) coefficients and a decay coefficient. For a target pose, we render binaural audio by evaluating the SH field to capture directionality, incorporating geometry-guided distance attenuation and phase correction, and reconstructing the waveform. Experiments on the Replay-NVAS dataset demonstrate that AudioGS successfully captures complex spatial cues and outperforms state-of-the-art visual-dependent baselines. Specifically, AudioGS reduces the magnitude reconstruction error (MAG) by over 14% and reduces the perceptual quality metric (DPAM) by approximately 25% compared to the best performing visual-guided method. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2604_08967 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | AudioGS: Spectrogram-Based Audio Gaussian Splatting for Sound Field Reconstruction Bi, Chunhao Zhong, Houqiang Xu, Zhixin Song, Li Cheng, Zhengxue Sound Spatial audio is fundamental to immersive virtual experiences, yet synthesizing high-fidelity binaural audio from sparse observations remains a significant challenge. Existing methods typically rely on implicit neural representations conditioned on visual priors, which often struggle to capture fine-grained acoustic structures. Inspired by 3D Gaussian Splatting (3DGS), we introduce AudioGS, a novel visual-free framework that explicitly encodes the sound field as a set of Audio Gaussians based on spectrograms. AudioGS associates each time-frequency bin with an Audio Gaussian equipped with dual Spherical Harmonic (SH) coefficients and a decay coefficient. For a target pose, we render binaural audio by evaluating the SH field to capture directionality, incorporating geometry-guided distance attenuation and phase correction, and reconstructing the waveform. Experiments on the Replay-NVAS dataset demonstrate that AudioGS successfully captures complex spatial cues and outperforms state-of-the-art visual-dependent baselines. Specifically, AudioGS reduces the magnitude reconstruction error (MAG) by over 14% and reduces the perceptual quality metric (DPAM) by approximately 25% compared to the best performing visual-guided method. |
| title | AudioGS: Spectrogram-Based Audio Gaussian Splatting for Sound Field Reconstruction |
| topic | Sound |
| url | https://arxiv.org/abs/2604.08967 |