:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Hazounne, Kim, Kihong, Lee, Sungho, Lee, Kyogu
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.19862
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking Speech Representation Aggregation in Speech Enhancement: A Phonetic Mutual Information Perspective
by: Han, Seungu, et al.
Published: (2026)

Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025)

Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation
by: Lee, Jaejun, et al.
Published: (2025)

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
by: Chae, Yunkee, et al.
Published: (2025)

Music De-limiter Networks via Sample-wise Gain Inversion
by: Jeon, Chang-Bin, et al.
Published: (2023)

Do Captioning Metrics Reflect Music Semantic Alignment?
by: Lee, Jinwoo, et al.
Published: (2024)

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
by: Lee, Sungho, et al.
Published: (2024)

Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025)

MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
by: Chae, Yunkee, et al.
Published: (2025)

Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
by: Lee, Sungho, et al.
Published: (2025)

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)

DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper
by: Yi, Jayeon, et al.
Published: (2024)

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
by: Kim, Jaeyeon, et al.
Published: (2024)

Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation
by: Lee, Jin Woo, et al.
Published: (2024)

String Sound Synthesizer on GPU-accelerated Finite Difference Scheme
by: Lee, Jin Woo, et al.
Published: (2023)

DOSE : Drum One-Shot Extraction from Music Mixture
by: Hwang, Suntae, et al.
Published: (2025)

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings
by: Rhyu, Seungyeon, et al.
Published: (2024)

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
by: Kim, Eungbeom, et al.
Published: (2024)

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
by: Yu, Chin-Yun, et al.
Published: (2023)

Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training
by: Joung, Haesun, et al.
Published: (2024)

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
by: Lee, Jin Woo, et al.
Published: (2024)

Hear Your Face: Face-based voice conversion with F0 estimation
by: Lee, Jaejun, et al.
Published: (2024)

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation
by: Oh, Yoori, et al.
Published: (2024)

Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)

Improving Test-Time Performance of RVQ-based Neural Codecs
by: Kim, Hyeongju, et al.
Published: (2025)

SAM: A Mamba-2 State-Space Audio-Language Model
by: Lee, Taehan, et al.
Published: (2025)

SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns
by: Lee, Yongjoon, et al.
Published: (2026)

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
by: Kim, Heeseung, et al.
Published: (2024)

Erasing Your Voice Before It's Heard: Training-free Speaker Unlearning for Zero-shot Text-to-Speech
by: Lee, Myungjin, et al.
Published: (2026)

Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
by: Zang, Yongyi, et al.
Published: (2024)

DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
by: Yang, Jinhyeok, et al.
Published: (2024)

Instance-Specific Test-Time Training for Speech Editing in the Wild
by: Kim, Taewoo, et al.
Published: (2025)

Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)

Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)

Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
by: Kim, Taewoo, et al.
Published: (2024)

UNMIXX: Untangling Highly Correlated Singing Voices Mixtures
by: Jung, Jihoo, et al.
Published: (2026)

Inter-channel Conv-TasNet for multichannel speech enhancement
by: Lee, Dongheon, et al.
Published: (2021)

Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection
by: Kim, Taewoo, et al.
Published: (2025)

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)