Saved in:
| Main Authors: | Lv, Yuanjun, Li, Hai, Yan, Ying, Liu, Junhui, Xie, Danming, Xie, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.08196 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
by: Shao, Mingchen, et al.
Published: (2025)
by: Shao, Mingchen, et al.
Published: (2025)
Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders
by: Ellinas, Nikolaos, et al.
Published: (2025)
by: Ellinas, Nikolaos, et al.
Published: (2025)
Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025)
by: Tian, Wenjie, et al.
Published: (2025)
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)
by: Jiang, Xiao-Hang, et al.
Published: (2024)
Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)
by: Welker, Simon, et al.
Published: (2025)
DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
by: Liu, Mingshuai, et al.
Published: (2024)
by: Liu, Mingshuai, et al.
Published: (2024)
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
by: Liu, Mingshuai, et al.
Published: (2024)
by: Liu, Mingshuai, et al.
Published: (2024)
SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
by: Wagner, Dominik, et al.
Published: (2023)
by: Wagner, Dominik, et al.
Published: (2023)
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
by: Du, Hui-Peng, et al.
Published: (2024)
by: Du, Hui-Peng, et al.
Published: (2024)
Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation
by: Guo, Hongming, et al.
Published: (2024)
by: Guo, Hongming, et al.
Published: (2024)
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
by: Ma, Linhan, et al.
Published: (2024)
by: Ma, Linhan, et al.
Published: (2024)
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)
by: Shao, Nian, et al.
Published: (2025)
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
by: Du, Chenpeng, et al.
Published: (2023)
by: Du, Chenpeng, et al.
Published: (2023)
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
by: Yang, Yujie, et al.
Published: (2025)
by: Yang, Yujie, et al.
Published: (2025)
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
by: Shao, Mingchen, et al.
Published: (2025)
by: Shao, Mingchen, et al.
Published: (2025)
MelTok: 2D Tokenization for Single-Codebook Audio Compression
by: Li, Jingyi, et al.
Published: (2025)
by: Li, Jingyi, et al.
Published: (2025)
UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)
by: Choi, Woongjib, et al.
Published: (2025)
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers
by: Masuyama, Yoshiki, et al.
Published: (2025)
by: Masuyama, Yoshiki, et al.
Published: (2025)
Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
by: Wang, Ju-Chiang, et al.
Published: (2024)
by: Wang, Ju-Chiang, et al.
Published: (2024)
XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation
by: Zuo, Tianlun, et al.
Published: (2025)
by: Zuo, Tianlun, et al.
Published: (2025)
Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder
by: Yang, Runxuan, et al.
Published: (2025)
by: Yang, Runxuan, et al.
Published: (2025)
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement
by: Guimarães, Heitor R., et al.
Published: (2025)
by: Guimarães, Heitor R., et al.
Published: (2025)
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
by: Liu, Wenrui, et al.
Published: (2024)
by: Liu, Wenrui, et al.
Published: (2024)
MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)
by: Zhu, Ge, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning
by: Rong, Yan, et al.
Published: (2025)
by: Rong, Yan, et al.
Published: (2025)
From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
by: Miccini, Riccardo, et al.
Published: (2026)
by: Miccini, Riccardo, et al.
Published: (2026)
Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms
by: Wang, Heehwan, et al.
Published: (2024)
by: Wang, Heehwan, et al.
Published: (2024)
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
by: Yoneyama, Reo, et al.
Published: (2025)
by: Yoneyama, Reo, et al.
Published: (2025)
Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
by: Meng, Yigang
Published: (2024)
by: Meng, Yigang
Published: (2024)
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label
by: Yang, Wenhao, et al.
Published: (2024)
by: Yang, Wenhao, et al.
Published: (2024)
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model
by: Chen, Shaowen, et al.
Published: (2025)
by: Chen, Shaowen, et al.
Published: (2025)
Similar Items
-
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
by: Shao, Mingchen, et al.
Published: (2025) -
Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders
by: Ellinas, Nikolaos, et al.
Published: (2025) -
Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025) -
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024) -
Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)