:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Daewoong, Dong, Hao-Wen, Jeong, Dasaem
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Machine Learning Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2409.12477
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space
by: Limberg, Christian, et al.
Published: (2025)

Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting
by: Kim, Hounsu, et al.
Published: (2024)

MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI
by: Jung, Jongmin, et al.
Published: (2024)

A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
by: Hung, Tzu-Yun, et al.
Published: (2024)

Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)

T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis
by: Chung, Yoonjin, et al.
Published: (2024)

On the de-duplication of the Lakh MIDI dataset
by: Choi, Eunjin, et al.
Published: (2025)

Prompt-Unseen-Emotion: Zero-shot Expressive Speech Synthesis with Prompt-LLM Contextual Knowledge for Mixed Emotions
by: Gao, Xiaoxue, et al.
Published: (2025)

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
by: Baoueb, Teysir, et al.
Published: (2024)

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
by: Hono, Yukiya, et al.
Published: (2024)

Deep Active Speech Cancellation with Mamba-Masking Network
by: Mishaly, Yehuda, et al.
Published: (2025)

Lightweight Self-Supervised Detection of Fundamental Frequency and Accurate Probability of Voicing in Monophonic Music
by: Bitra, Venkat Suprabath, et al.
Published: (2026)

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
by: Lee, Sang-Hoon, et al.
Published: (2024)

Scaling Transformers for Low-Bitrate High-Quality Speech Coding
by: Parker, Julian D, et al.
Published: (2024)

An Explainable Proxy Model for Multiabel Audio Segmentation
by: Mariotte, Théo, et al.
Published: (2024)

LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
by: Mancini, Eleonora, et al.
Published: (2024)

Gull: A Generative Multifunctional Audio Codec
by: Luo, Yi, et al.
Published: (2024)

Design Of Rubble Analyzer Probe Using ML For Earthquake
by: Sebastian, Abhishek, et al.
Published: (2023)

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training
by: Wang, Yuancheng, et al.
Published: (2025)

ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications
by: Ackva, Valentin, et al.
Published: (2025)

BUET Multi-disease Heart Sound Dataset: A Comprehensive Auscultation Dataset for Developing Computer-Aided Diagnostic Systems
by: Ali, Shams Nafisa, et al.
Published: (2024)

Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
by: Tuncay, Ludovic, et al.
Published: (2025)

Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
by: Di Carlo, Diego, et al.
Published: (2025)

MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024)

Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?
by: Sinha, Abhijit, et al.
Published: (2025)

AudioFuse: Unified Spectral-Temporal Learning via a Hybrid ViT-1D CNN Architecture for Robust Phonocardiogram Classification
by: Siddiqui, Md. Saiful Bari, et al.
Published: (2025)

Model as Loss: A Self-Consistent Training Paradigm
by: Phaye, Saisamarth Rajesh, et al.
Published: (2025)

When Humans Growl and Birds Speak: High-Fidelity Voice Conversion from Human to Animal and Designed Sounds
by: Kang, Minsu, et al.
Published: (2025)

Real-time Timbre Remapping with Differentiable DSP
by: Shier, Jordie, et al.
Published: (2024)

TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
by: Huang, Zhaolan, et al.
Published: (2024)

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)

Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models
by: Baoueb, Teysir, et al.
Published: (2025)

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
by: Kwon, Taegyun, et al.
Published: (2024)

Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation
by: Yoo, HaeJun, et al.
Published: (2024)

LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
by: wu, Weihao, et al.
Published: (2025)

VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
by: Choi, Ha-Yeong, et al.
Published: (2025)

SwiftF0: Fast and Accurate Monophonic Pitch Detection
by: Nieradzik, Lars
Published: (2025)

Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)