Saved in:
| Main Authors: | Li, Andong, Lei, Tong, Sun, Zhihang, Chen, Rilin, Yin, Erwei, Li, Xiaodong, Zheng, Chengshi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.20731 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026)
by: Li, Andong, et al.
Published: (2026)
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks
by: Dai, Lingling, et al.
Published: (2026)
by: Dai, Lingling, et al.
Published: (2026)
SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression
by: Sun, Zhihang, et al.
Published: (2024)
by: Sun, Zhihang, et al.
Published: (2024)
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
by: Liang, Yifan, et al.
Published: (2025)
by: Liang, Yifan, et al.
Published: (2025)
Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025)
by: Wang, Taihui, et al.
Published: (2025)
BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
by: Fan, Cunhang, et al.
Published: (2024)
by: Fan, Cunhang, et al.
Published: (2024)
From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
by: Mu, Zhaoxi, et al.
Published: (2025)
by: Mu, Zhaoxi, et al.
Published: (2025)
Gen-SER: When the generative model meets speech emotion recognition
by: Wang, Taihui, et al.
Published: (2026)
by: Wang, Taihui, et al.
Published: (2026)
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
by: Du, Hui-Peng, et al.
Published: (2024)
by: Du, Hui-Peng, et al.
Published: (2024)
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)
by: Chi, Cheng, et al.
Published: (2024)
High-Fidelity Music Vocoder using Neural Audio Codecs
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
by: Lv, Yuanjun, et al.
Published: (2024)
by: Lv, Yuanjun, et al.
Published: (2024)
Video-to-Audio Generation with Fine-grained Temporal Semantics
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)
by: Jiang, Xiao-Hang, et al.
Published: (2024)
Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders
by: Ellinas, Nikolaos, et al.
Published: (2025)
by: Ellinas, Nikolaos, et al.
Published: (2025)
QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model
by: Chen, Shaowen, et al.
Published: (2025)
by: Chen, Shaowen, et al.
Published: (2025)
Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder
by: Yang, Runxuan, et al.
Published: (2025)
by: Yang, Runxuan, et al.
Published: (2025)
LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning
by: Yang, Kang, et al.
Published: (2025)
by: Yang, Kang, et al.
Published: (2025)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction
by: Wang, Chengzhong, et al.
Published: (2026)
by: Wang, Chengzhong, et al.
Published: (2026)
Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum
by: Al-Radhi, Mohammed Salah, et al.
Published: (2026)
by: Al-Radhi, Mohammed Salah, et al.
Published: (2026)
Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement
by: Guimarães, Heitor R., et al.
Published: (2025)
by: Guimarães, Heitor R., et al.
Published: (2025)
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
by: Hong, Seongho, et al.
Published: (2025)
by: Hong, Seongho, et al.
Published: (2025)
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
by: Agrawal, Prabhav, et al.
Published: (2024)
by: Agrawal, Prabhav, et al.
Published: (2024)
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
by: Shibuya, Takashi, et al.
Published: (2023)
by: Shibuya, Takashi, et al.
Published: (2023)
Rethinking the joint estimation of magnitude and phase for time-frequency domain neural vocoders
by: Dai, Lingling, et al.
Published: (2025)
by: Dai, Lingling, et al.
Published: (2025)
Vocoder-Projected Feature Discriminator
by: Kaneko, Takuhiro, et al.
Published: (2025)
by: Kaneko, Takuhiro, et al.
Published: (2025)
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Deep Learning for Personalized Binaural Audio Reproduction
by: Lu, Xikun, et al.
Published: (2025)
by: Lu, Xikun, et al.
Published: (2025)
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
by: Du, Chenpeng, et al.
Published: (2023)
by: Du, Chenpeng, et al.
Published: (2023)
SemanticVocoder: Bridging Audio Generation and Audio Understanding via Semantic Latents
by: Xie, Zeyu, et al.
Published: (2026)
by: Xie, Zeyu, et al.
Published: (2026)
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
by: Yoneyama, Reo, et al.
Published: (2025)
by: Yoneyama, Reo, et al.
Published: (2025)
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
by: Ren, Yong, et al.
Published: (2024)
by: Ren, Yong, et al.
Published: (2024)
MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)
by: Zhu, Ge, et al.
Published: (2024)
Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL features
by: Ohnaka, Hien, et al.
Published: (2026)
by: Ohnaka, Hien, et al.
Published: (2026)
Similar Items
-
Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026) -
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025) -
BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025) -
GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks
by: Dai, Lingling, et al.
Published: (2026) -
SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression
by: Sun, Zhihang, et al.
Published: (2024)