:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Rubing, Ren, Yanzhen, Sun, Zongkun
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.04575
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Is GAN Necessary for Mel-Spectrogram-based Neural Vocoder?
by: Du, Hui-Peng, et al.
Published: (2025)

A Universal Harmonic Discriminator for High-quality GAN-based Vocoder
by: Xu, Nan, et al.
Published: (2025)

QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model
by: Chen, Shaowen, et al.
Published: (2025)

VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
by: Cao, Yubing, et al.
Published: (2024)

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
by: Shibuya, Takashi, et al.
Published: (2023)

WaveNeXt 2: ConvNeXt-Based Fast Neural Vocoders With Residual Denoising and Sub-Modeling for GAN and Diffusion Models
by: Zhou, Wangzixi, et al.
Published: (2026)

Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation
by: Yao, Zengwei, et al.
Published: (2025)

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN
by: Zhang, Shiqi, et al.
Published: (2024)

Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)

A Distilled Low-Latency Neural Vocoder with Explicit Amplitude and Phase Prediction
by: Du, Hui-Peng, et al.
Published: (2025)

GAN-Based Multi-Microphone Spatial Target Speaker Extraction
by: Shetu, Shrishti Saha, et al.
Published: (2025)

DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration
by: Serbest, Sanberk, et al.
Published: (2025)

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
by: Cho, Hyunjae, et al.
Published: (2024)

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
by: Nguyen, Tan Dat, et al.
Published: (2024)

Leveraging Discriminative Latent Representations for Conditioning GAN-Based Speech Enhancement
by: Shetu, Shrishti Saha, et al.
Published: (2025)

MaskCycleGAN-based Whisper to Normal Speech Conversion
by: Gupta, K. Rohith, et al.
Published: (2024)

Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
by: Liu, Wuyang, et al.
Published: (2023)

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
by: Du, Hui-Peng, et al.
Published: (2024)

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
by: Baoueb, Teysir, et al.
Published: (2024)

TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
by: Ma, Chengyuan, et al.
Published: (2026)

A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions
by: Du, Hui-Peng, et al.
Published: (2024)

DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)

Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)

AudioGAN: A Compact and Efficient Framework for Real-Time High-Fidelity Text-to-Audio Generation
by: Chung, HaeChun
Published: (2025)

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)

MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)

Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction
by: Valin, Jean-Marc, et al.
Published: (2024)

Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)

Disentanglement in a GAN for Unconditional Speech Synthesis
by: Baas, Matthew, et al.
Published: (2023)

Ultra-Low-Bitrate Mel-Spectrogram-based Neural Speech Coding with Flow-Matching-based Refinement and Vocoding-driven Reconstruction
by: Du, Hui-Peng, et al.
Published: (2026)

GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning
by: Shetu, Shrishti Saha, et al.
Published: (2024)

Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction
by: Du, Renmingyue, et al.
Published: (2024)

Non-Causal to Causal SSL-Supported Transfer Learning: Towards a High-Performance Low-Latency Speech Vocoder
by: Shi, Renzheng, et al.
Published: (2024)

CMGAN: Conformer-based Metric GAN for Speech Enhancement
by: Cao, Ruizhe, et al.
Published: (2022)

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
by: Du, Chenpeng, et al.
Published: (2023)

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)

Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
by: Yoneyama, Reo, et al.
Published: (2025)

FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
by: Lv, Yuanjun, et al.
Published: (2024)

Leveraging Self-Supervised Audio-Visual Pretrained Models to Improve Vocoded Speech Intelligibility in Cochlear Implant Simulation
by: Lai, Richard Lee, et al.
Published: (2023)

EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
by: Zhang, Tong, et al.
Published: (2025)