Saved in:
| Main Authors: | Nguyen, Tan Dat, Kim, Ji-Hoon, Jang, Youngjoon, Kim, Jaehun, Chung, Joon Son |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.10032 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
by: Nguyen, Tan Dat, et al.
Published: (2025)
by: Nguyen, Tan Dat, et al.
Published: (2025)
MamTra: A Hybrid Mamba-Transformer Backbone for Speech Synthesis
by: Nguyen, Tan Dat, et al.
Published: (2026)
by: Nguyen, Tan Dat, et al.
Published: (2026)
AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025)
by: Kim, Jaehun, et al.
Published: (2025)
VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
by: Jung, Jaemin, et al.
Published: (2024)
by: Jung, Jaemin, et al.
Published: (2024)
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
by: Kim, Ji-Hoon, et al.
Published: (2023)
by: Kim, Ji-Hoon, et al.
Published: (2023)
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
by: Hono, Yukiya, et al.
Published: (2024)
by: Hono, Yukiya, et al.
Published: (2024)
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)
by: Kim, Ji-Hoon, et al.
Published: (2024)
LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
by: Kwak, Doyeop, et al.
Published: (2025)
by: Kwak, Doyeop, et al.
Published: (2025)
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training
by: Kwak, Doyeop, et al.
Published: (2025)
by: Kwak, Doyeop, et al.
Published: (2025)
MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)
by: Zhu, Ge, et al.
Published: (2024)
SCORE: Scaling audio generation using Standardized COmposite REwards
by: Jung, Jaemin, et al.
Published: (2025)
by: Jung, Jaemin, et al.
Published: (2025)
Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation
by: Kim, Miseul, et al.
Published: (2024)
by: Kim, Miseul, et al.
Published: (2024)
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions
by: Du, Hui-Peng, et al.
Published: (2024)
by: Du, Hui-Peng, et al.
Published: (2024)
InfiniteAudio: Infinite-Length Audio Generation with Consistency
by: Jung, Chaeyoung, et al.
Published: (2025)
by: Jung, Chaeyoung, et al.
Published: (2025)
QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model
by: Chen, Shaowen, et al.
Published: (2025)
by: Chen, Shaowen, et al.
Published: (2025)
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
by: Nguyen, Tan Dat, et al.
Published: (2024)
by: Nguyen, Tan Dat, et al.
Published: (2024)
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)
by: Jung, Chaeyoung, et al.
Published: (2024)
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
by: Kim, Ji-Hoon, et al.
Published: (2025)
by: Kim, Ji-Hoon, et al.
Published: (2025)
PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement
by: Zhou, Nan, et al.
Published: (2024)
by: Zhou, Nan, et al.
Published: (2024)
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)
by: Welker, Simon, et al.
Published: (2025)
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
by: Liu, Haocheng, et al.
Published: (2024)
by: Liu, Haocheng, et al.
Published: (2024)
GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
by: Baoueb, Teysir, et al.
Published: (2025)
by: Baoueb, Teysir, et al.
Published: (2025)
Probing Cross-modal Information Hubs in Audio-Visual LLMs
by: Jung, Jihoo, et al.
Published: (2026)
by: Jung, Jihoo, et al.
Published: (2026)
MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)
by: Pham, The Hieu, et al.
Published: (2025)
VoxSim: A perceptual voice similarity dataset
by: Ahn, Junseok, et al.
Published: (2024)
by: Ahn, Junseok, et al.
Published: (2024)
Lightweight Audio Segmentation for Long-form Speech Translation
by: Lee, Jaesong, et al.
Published: (2024)
by: Lee, Jaesong, et al.
Published: (2024)
UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)
by: Choi, Woongjib, et al.
Published: (2025)
SUNAC: Source-aware Unified Neural Audio Codec
by: Aihara, Ryo, et al.
Published: (2025)
by: Aihara, Ryo, et al.
Published: (2025)
The Overview of Segmental Durations Modification Algorithms on Speech Signal Characteristics
by: Jang, Kyeomeun, et al.
Published: (2025)
by: Jang, Kyeomeun, et al.
Published: (2025)
Relational Proxy Loss for Audio-Text based Keyword Spotting
by: Jung, Youngmoon, et al.
Published: (2024)
by: Jung, Youngmoon, et al.
Published: (2024)
UNMIXX: Untangling Highly Correlated Singing Voices Mixtures
by: Jung, Jihoo, et al.
Published: (2026)
by: Jung, Jihoo, et al.
Published: (2026)
Wideband Relative Transfer Function (RTF) Estimation Exploiting Frequency Correlations
by: Bologni, Giovanni, et al.
Published: (2024)
by: Bologni, Giovanni, et al.
Published: (2024)
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
by: Yang, Shu-wen, et al.
Published: (2025)
by: Yang, Shu-wen, et al.
Published: (2025)
FUN-SSL: Full-band Layer Followed by U-Net with Narrow-band Layers for Multiple Moving Sound Source Localization
by: Choi, Yuseon, et al.
Published: (2025)
by: Choi, Yuseon, et al.
Published: (2025)
SpeechMLC: Speech Multi-label Classification
by: Kim, Miseul, et al.
Published: (2025)
by: Kim, Miseul, et al.
Published: (2025)
Neural Spectral Band Generation for Audio Coding
by: Choi, Woongjib, et al.
Published: (2025)
by: Choi, Woongjib, et al.
Published: (2025)
Chirp Group Delay based Onset Detection in Instruments with Fast Attack
by: Joysingh, S. Johanan, et al.
Published: (2024)
by: Joysingh, S. Johanan, et al.
Published: (2024)
Similar Items
-
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
by: Nguyen, Tan Dat, et al.
Published: (2025) -
MamTra: A Hybrid Mamba-Transformer Backbone for Speech Synthesis
by: Nguyen, Tan Dat, et al.
Published: (2026) -
AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025) -
VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
by: Jung, Jaemin, et al.
Published: (2024) -
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
by: Kim, Ji-Hoon, et al.
Published: (2023)