Saved in:
Bibliographic Details
Main Authors: Dai, Lingling, Li, Andong, Chi, Cheng, Liang, Yifan, Li, Xiaodong, Zheng, Chengshi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.13758
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914265988857856
author Dai, Lingling
Li, Andong
Chi, Cheng
Liang, Yifan
Li, Xiaodong
Zheng, Chengshi
author_facet Dai, Lingling
Li, Andong
Chi, Cheng
Liang, Yifan
Li, Xiaodong
Zheng, Chengshi
contents In the field of audio generation, signal-to-noise ratio (SNR) has long served as an objective metric for evaluating audio quality. Nevertheless, recent studies have shown that SNR and its variants are not always highly correlated with human perception, prompting us to raise the questions: Why does SNR fail in measuring audio quality? And how to improve its reliability as an objective metric? In this paper, we identify the inadequate measurement of phase distance as a pivotal factor and propose to reformulate SNR with specially designed phase-distance terms, yielding an improved metric named GOMPSNR. We further extend the newly proposed formulation to derive two novel categories of loss function, corresponding to magnitude-guided phase refinement and joint magnitude-phase optimization, respectively. Besides, extensive experiments are conducted for an optimal combination of different loss functions. Experimental results on advanced neural vocoders demonstrate that our proposed GOMPSNR exhibits more reliable error measurement than SNR. Meanwhile, our proposed loss functions yield substantial improvements in model performance, and our wellchosen combination of different loss functions further optimizes the overall model capability.
format Preprint
id arxiv_https___arxiv_org_abs_2601_13758
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks
Dai, Lingling
Li, Andong
Chi, Cheng
Liang, Yifan
Li, Xiaodong
Zheng, Chengshi
Sound
In the field of audio generation, signal-to-noise ratio (SNR) has long served as an objective metric for evaluating audio quality. Nevertheless, recent studies have shown that SNR and its variants are not always highly correlated with human perception, prompting us to raise the questions: Why does SNR fail in measuring audio quality? And how to improve its reliability as an objective metric? In this paper, we identify the inadequate measurement of phase distance as a pivotal factor and propose to reformulate SNR with specially designed phase-distance terms, yielding an improved metric named GOMPSNR. We further extend the newly proposed formulation to derive two novel categories of loss function, corresponding to magnitude-guided phase refinement and joint magnitude-phase optimization, respectively. Besides, extensive experiments are conducted for an optimal combination of different loss functions. Experimental results on advanced neural vocoders demonstrate that our proposed GOMPSNR exhibits more reliable error measurement than SNR. Meanwhile, our proposed loss functions yield substantial improvements in model performance, and our wellchosen combination of different loss functions further optimizes the overall model capability.
title GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks
topic Sound
url https://arxiv.org/abs/2601.13758