:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Andong, Lei, Tong, Sun, Zhihang, Chen, Rilin, Yin, Erwei, Li, Xiaodong, Zheng, Chengshi
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2507.20731
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026)

Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)

BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025)

GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks
by: Dai, Lingling, et al.
Published: (2026)

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression
by: Sun, Zhihang, et al.
Published: (2024)

NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
by: Liang, Yifan, et al.
Published: (2025)

Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025)

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
by: Fan, Cunhang, et al.
Published: (2024)

From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
by: Mu, Zhaoxi, et al.
Published: (2025)

Gen-SER: When the generative model meets speech emotion recognition
by: Wang, Taihui, et al.
Published: (2026)

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
by: Du, Hui-Peng, et al.
Published: (2024)

End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)

High-Fidelity Music Vocoder using Neural Audio Codecs
by: Lanzendörfer, Luca A., et al.
Published: (2025)

FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
by: Lv, Yuanjun, et al.
Published: (2024)

Video-to-Audio Generation with Fine-grained Temporal Semantics
by: Hu, Yuchen, et al.
Published: (2024)

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)

Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders
by: Ellinas, Nikolaos, et al.
Published: (2025)

QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model
by: Chen, Shaowen, et al.
Published: (2025)

Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder
by: Yang, Runxuan, et al.
Published: (2025)

LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning
by: Yang, Kang, et al.
Published: (2025)

AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)

Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction
by: Wang, Chengzhong, et al.
Published: (2026)

Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum
by: Al-Radhi, Mohammed Salah, et al.
Published: (2026)

Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement
by: Guimarães, Heitor R., et al.
Published: (2025)

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)

RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
by: Hong, Seongho, et al.
Published: (2025)

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
by: Agrawal, Prabhav, et al.
Published: (2024)

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
by: Shibuya, Takashi, et al.
Published: (2023)

Rethinking the joint estimation of magnitude and phase for time-frequency domain neural vocoders
by: Dai, Lingling, et al.
Published: (2025)

Vocoder-Projected Feature Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2025)

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)

Deep Learning for Personalized Binaural Audio Reproduction
by: Lu, Xikun, et al.
Published: (2025)

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
by: Du, Chenpeng, et al.
Published: (2023)

SemanticVocoder: Bridging Audio Generation and Audio Understanding via Semantic Latents
by: Xie, Zeyu, et al.
Published: (2026)

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation
by: Chen, Shihao, et al.
Published: (2024)

Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
by: Yoneyama, Reo, et al.
Published: (2025)

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
by: Ren, Yong, et al.
Published: (2024)

MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)

Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL features
by: Ohnaka, Hien, et al.
Published: (2026)