Saved in:
| Main Authors: | Kim, Daewoong, Dong, Hao-Wen, Jeong, Dasaem |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.12477 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space
by: Limberg, Christian, et al.
Published: (2025)
by: Limberg, Christian, et al.
Published: (2025)
Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting
by: Kim, Hounsu, et al.
Published: (2024)
by: Kim, Hounsu, et al.
Published: (2024)
MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI
by: Jung, Jongmin, et al.
Published: (2024)
by: Jung, Jongmin, et al.
Published: (2024)
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
by: Hung, Tzu-Yun, et al.
Published: (2024)
by: Hung, Tzu-Yun, et al.
Published: (2024)
Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)
by: Peng, Yueh-Po, et al.
Published: (2025)
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis
by: Chung, Yoonjin, et al.
Published: (2024)
by: Chung, Yoonjin, et al.
Published: (2024)
On the de-duplication of the Lakh MIDI dataset
by: Choi, Eunjin, et al.
Published: (2025)
by: Choi, Eunjin, et al.
Published: (2025)
Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions
by: Gao, Xiaoxue, et al.
Published: (2025)
by: Gao, Xiaoxue, et al.
Published: (2025)
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
by: Baoueb, Teysir, et al.
Published: (2024)
by: Baoueb, Teysir, et al.
Published: (2024)
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
by: Hono, Yukiya, et al.
Published: (2024)
by: Hono, Yukiya, et al.
Published: (2024)
Deep Active Speech Cancellation with Mamba-Masking Network
by: Mishaly, Yehuda, et al.
Published: (2025)
by: Mishaly, Yehuda, et al.
Published: (2025)
Lightweight Self-Supervised Detection of Fundamental Frequency and Accurate Probability of Voicing in Monophonic Music
by: Bitra, Venkat Suprabath, et al.
Published: (2026)
by: Bitra, Venkat Suprabath, et al.
Published: (2026)
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
by: Lee, Sang-Hoon, et al.
Published: (2024)
by: Lee, Sang-Hoon, et al.
Published: (2024)
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
by: Parker, Julian D, et al.
Published: (2024)
by: Parker, Julian D, et al.
Published: (2024)
An Explainable Proxy Model for Multiabel Audio Segmentation
by: Mariotte, Théo, et al.
Published: (2024)
by: Mariotte, Théo, et al.
Published: (2024)
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
by: Mancini, Eleonora, et al.
Published: (2024)
by: Mancini, Eleonora, et al.
Published: (2024)
Gull: A Generative Multifunctional Audio Codec
by: Luo, Yi, et al.
Published: (2024)
by: Luo, Yi, et al.
Published: (2024)
Design Of Rubble Analyzer Probe Using ML For Earthquake
by: Sebastian, Abhishek, et al.
Published: (2023)
by: Sebastian, Abhishek, et al.
Published: (2023)
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
by: Wang, Yuancheng, et al.
Published: (2025)
by: Wang, Yuancheng, et al.
Published: (2025)
ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications
by: Ackva, Valentin, et al.
Published: (2025)
by: Ackva, Valentin, et al.
Published: (2025)
BUET Multi-disease Heart Sound Dataset: A Comprehensive Auscultation Dataset for Developing Computer-Aided Diagnostic Systems
by: Ali, Shams Nafisa, et al.
Published: (2024)
by: Ali, Shams Nafisa, et al.
Published: (2024)
Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
by: Tuncay, Ludovic, et al.
Published: (2025)
by: Tuncay, Ludovic, et al.
Published: (2025)
Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
by: Di Carlo, Diego, et al.
Published: (2025)
by: Di Carlo, Diego, et al.
Published: (2025)
MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)
by: Li, Xu, et al.
Published: (2024)
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024)
by: Lee, Sang-Hoon, et al.
Published: (2024)
Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?
by: Sinha, Abhijit, et al.
Published: (2025)
by: Sinha, Abhijit, et al.
Published: (2025)
AudioFuse: Unified Spectral-Temporal Learning via a Hybrid ViT-1D CNN Architecture for Robust Phonocardiogram Classification
by: Siddiqui, Md. Saiful Bari, et al.
Published: (2025)
by: Siddiqui, Md. Saiful Bari, et al.
Published: (2025)
Model as Loss: A Self-Consistent Training Paradigm
by: Phaye, Saisamarth Rajesh, et al.
Published: (2025)
by: Phaye, Saisamarth Rajesh, et al.
Published: (2025)
When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds
by: Kang, Minsu, et al.
Published: (2025)
by: Kang, Minsu, et al.
Published: (2025)
Real-time Timbre Remapping with Differentiable DSP
by: Shier, Jordie, et al.
Published: (2024)
by: Shier, Jordie, et al.
Published: (2024)
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
by: Huang, Zhaolan, et al.
Published: (2024)
by: Huang, Zhaolan, et al.
Published: (2024)
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)
by: Kim, Ji-Hoon, et al.
Published: (2024)
Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models
by: Baoueb, Teysir, et al.
Published: (2025)
by: Baoueb, Teysir, et al.
Published: (2025)
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
by: Kwon, Taegyun, et al.
Published: (2024)
by: Kwon, Taegyun, et al.
Published: (2024)
Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation
by: Yoo, HaeJun, et al.
Published: (2024)
by: Yoo, HaeJun, et al.
Published: (2024)
LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
by: wu, Weihao, et al.
Published: (2025)
by: wu, Weihao, et al.
Published: (2025)
VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
by: Choi, Ha-Yeong, et al.
Published: (2025)
by: Choi, Ha-Yeong, et al.
Published: (2025)
SwiftF0: Fast and Accurate Monophonic Pitch Detection
by: Nieradzik, Lars
Published: (2025)
by: Nieradzik, Lars
Published: (2025)
Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)
by: Yutani, Tsugumasa, et al.
Published: (2024)
Similar Items
-
Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space
by: Limberg, Christian, et al.
Published: (2025) -
Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting
by: Kim, Hounsu, et al.
Published: (2024) -
MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI
by: Jung, Jongmin, et al.
Published: (2024) -
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
by: Hung, Tzu-Yun, et al.
Published: (2024) -
Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)