Saved in:
| Main Authors: | Lee, Hazounne, Kim, Kihong, Lee, Sungho, Lee, Kyogu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.19862 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026)
by: Han, Seungu, et al.
Published: (2026)
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025)
by: Han, Seungu, et al.
Published: (2025)
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025)
by: Lee, Jaejun, et al.
Published: (2025)
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)
by: Chae, Yunkee, et al.
Published: (2025)
Music De-limiter Networks via Sample-wise Gain Inversion
by: Jeon, Chang-Bin, et al.
Published: (2023)
by: Jeon, Chang-Bin, et al.
Published: (2023)
Do Captioning Metrics Reflect Music Semantic Alignment?
by: Lee, Jinwoo, et al.
Published: (2024)
by: Lee, Jinwoo, et al.
Published: (2024)
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
by: Lee, Sungho, et al.
Published: (2024)
by: Lee, Sungho, et al.
Published: (2024)
Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025)
by: Lee, Sungho, et al.
Published: (2025)
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
by: Chae, Yunkee, et al.
Published: (2025)
by: Chae, Yunkee, et al.
Published: (2025)
Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
by: Lee, Sungho, et al.
Published: (2025)
by: Lee, Sungho, et al.
Published: (2025)
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)
by: Hwang, Injune, et al.
Published: (2024)
DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper
by: Yi, Jayeon, et al.
Published: (2024)
by: Yi, Jayeon, et al.
Published: (2024)
Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
by: Kim, Jaeyeon, et al.
Published: (2024)
by: Kim, Jaeyeon, et al.
Published: (2024)
Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation
by: Lee, Jin Woo, et al.
Published: (2024)
by: Lee, Jin Woo, et al.
Published: (2024)
String Sound Synthesizer on GPU-accelerated Finite Difference Scheme
by: Lee, Jin Woo, et al.
Published: (2023)
by: Lee, Jin Woo, et al.
Published: (2023)
DOSE : Drum One-Shot Extraction from Music Mixture
by: Hwang, Suntae, et al.
Published: (2025)
by: Hwang, Suntae, et al.
Published: (2025)
Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings
by: Rhyu, Seungyeon, et al.
Published: (2024)
by: Rhyu, Seungyeon, et al.
Published: (2024)
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024)
by: Kim, Eungbeom, et al.
Published: (2024)
Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
by: Yu, Chin-Yun, et al.
Published: (2023)
by: Yu, Chin-Yun, et al.
Published: (2023)
Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training
by: Joung, Haesun, et al.
Published: (2024)
by: Joung, Haesun, et al.
Published: (2024)
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
by: Lee, Jin Woo, et al.
Published: (2024)
by: Lee, Jin Woo, et al.
Published: (2024)
Hear Your Face: Face-based voice conversion with F0 estimation
by: Lee, Jaejun, et al.
Published: (2024)
by: Lee, Jaejun, et al.
Published: (2024)
Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
by: Oh, Yoori, et al.
Published: (2024)
by: Oh, Yoori, et al.
Published: (2024)
Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)
by: Yutani, Tsugumasa, et al.
Published: (2024)
Improving Test-Time Performance of RVQ-based Neural Codecs
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
SAM: A Mamba-2 State-Space Audio-Language Model
by: Lee, Taehan, et al.
Published: (2025)
by: Lee, Taehan, et al.
Published: (2025)
SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns
by: Lee, Yongjoon, et al.
Published: (2026)
by: Lee, Yongjoon, et al.
Published: (2026)
VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
by: Kim, Heeseung, et al.
Published: (2024)
by: Kim, Heeseung, et al.
Published: (2024)
Erasing Your Voice Before It's Heard: Training-free Speaker Unlearning for Zero-shot Text-to-Speech
by: Lee, Myungjin, et al.
Published: (2026)
by: Lee, Myungjin, et al.
Published: (2026)
Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
by: Zang, Yongyi, et al.
Published: (2024)
by: Zang, Yongyi, et al.
Published: (2024)
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
by: Yang, Jinhyeok, et al.
Published: (2024)
by: Yang, Jinhyeok, et al.
Published: (2024)
Instance-Specific Test-Time Training for Speech Editing in the Wild
by: Kim, Taewoo, et al.
Published: (2025)
by: Kim, Taewoo, et al.
Published: (2025)
Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)
by: Lee, Yongjoon, et al.
Published: (2024)
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)
by: Kim, Youngjae, et al.
Published: (2024)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
by: Kim, Taewoo, et al.
Published: (2024)
by: Kim, Taewoo, et al.
Published: (2024)
UNMIXX: Untangling Highly Correlated Singing Voices Mixtures
by: Jung, Jihoo, et al.
Published: (2026)
by: Jung, Jihoo, et al.
Published: (2026)
Inter-channel Conv-TasNet for multichannel speech enhancement
by: Lee, Dongheon, et al.
Published: (2021)
by: Lee, Dongheon, et al.
Published: (2021)
Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection
by: Kim, Taewoo, et al.
Published: (2025)
by: Kim, Taewoo, et al.
Published: (2025)
Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)
by: Lee, Seo-Hyun, et al.
Published: (2023)
Similar Items
-
Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026) -
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025) -
Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025) -
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025) -
Music De-limiter Networks via Sample-wise Gain Inversion
by: Jeon, Chang-Bin, et al.
Published: (2023)