Saved in:
| Main Authors: | Szewczyk, Konrad, Fernández, Daniel Gallo, Townsend, James |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.02401 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Symbotunes: unified hub for symbolic music generative models
by: Skierś, Paweł, et al.
Published: (2024)
by: Skierś, Paweł, et al.
Published: (2024)
Supervised contrastive learning from weakly-labeled audio segments for musical version matching
by: Serrà, Joan, et al.
Published: (2025)
by: Serrà, Joan, et al.
Published: (2025)
Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025)
by: Ellis, Daniel P. W., et al.
Published: (2025)
Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
by: Torres, Bernardo, et al.
Published: (2025)
by: Torres, Bernardo, et al.
Published: (2025)
Long-form music generation with latent diffusion
by: Evans, Zach, et al.
Published: (2024)
by: Evans, Zach, et al.
Published: (2024)
Deep learning for music generation. Four approaches and their comparative evaluation
by: Paroiu, Razvan, et al.
Published: (2025)
by: Paroiu, Razvan, et al.
Published: (2025)
Evaluation of pretrained language models on music understanding
by: Vasilakis, Yannis, et al.
Published: (2024)
by: Vasilakis, Yannis, et al.
Published: (2024)
Joint sentiment analysis of lyrics and audio in music
by: Schaab, Lea, et al.
Published: (2024)
by: Schaab, Lea, et al.
Published: (2024)
StemGen: A music generation model that listens
by: Parker, Julian D., et al.
Published: (2023)
by: Parker, Julian D., et al.
Published: (2023)
A Non-autoregressive Model for Joint STT and TTS
by: Sunder, Vishal, et al.
Published: (2025)
by: Sunder, Vishal, et al.
Published: (2025)
Text Conditioned Symbolic Drumbeat Generation using Latent Diffusion Models
by: Jajoria, Pushkar, et al.
Published: (2024)
by: Jajoria, Pushkar, et al.
Published: (2024)
Steering Autoregressive Music Generation with Recursive Feature Machines
by: Zhao, Daniel, et al.
Published: (2025)
by: Zhao, Daniel, et al.
Published: (2025)
Non-autoregressive real-time Accent Conversion model with voice cloning
by: Nechaev, Vladimir, et al.
Published: (2024)
by: Nechaev, Vladimir, et al.
Published: (2024)
MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling
by: Rouard, Simon, et al.
Published: (2025)
by: Rouard, Simon, et al.
Published: (2025)
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
by: Wang, Tianzi, et al.
Published: (2024)
by: Wang, Tianzi, et al.
Published: (2024)
SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
by: Tan, Jiaye, et al.
Published: (2025)
by: Tan, Jiaye, et al.
Published: (2025)
SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling
by: Ahmed, Tawsif, et al.
Published: (2025)
by: Ahmed, Tawsif, et al.
Published: (2025)
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
by: Alex, Tony, et al.
Published: (2025)
by: Alex, Tony, et al.
Published: (2025)
Recognizing Ornaments in Vocal Indian Art Music with Active Annotation
by: Kumar, Sumit, et al.
Published: (2025)
by: Kumar, Sumit, et al.
Published: (2025)
SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer
by: Di Carlo, Diego, et al.
Published: (2025)
by: Di Carlo, Diego, et al.
Published: (2025)
Speech Command Recognition Using LogNNet Reservoir Computing for Embedded Systems
by: Izotov, Yuriy, et al.
Published: (2025)
by: Izotov, Yuriy, et al.
Published: (2025)
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
by: Pasini, Marco, et al.
Published: (2025)
by: Pasini, Marco, et al.
Published: (2025)
AudioGenX: Explainability on Text-to-Audio Generative Models
by: Kang, Hyunju, et al.
Published: (2025)
by: Kang, Hyunju, et al.
Published: (2025)
A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering
by: Chowdhury, Shahana Yasmin, et al.
Published: (2025)
by: Chowdhury, Shahana Yasmin, et al.
Published: (2025)
DeepGB-TB: A Risk-Balanced Cross-Attention Gradient-Boosted Convolutional Network for Rapid, Interpretable Tuberculosis Screening
by: Lu, Zhixiang, et al.
Published: (2025)
by: Lu, Zhixiang, et al.
Published: (2025)
Text-Queried Audio Source Separation via Hierarchical Modeling
by: Yin, Xinlei, et al.
Published: (2025)
by: Yin, Xinlei, et al.
Published: (2025)
CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
by: Kanatas, Angelos-Nikolaos, et al.
Published: (2025)
by: Kanatas, Angelos-Nikolaos, et al.
Published: (2025)
JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention
by: Ioannides, Georgios, et al.
Published: (2025)
by: Ioannides, Georgios, et al.
Published: (2025)
Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture
by: Singh, Karamvir
Published: (2025)
by: Singh, Karamvir
Published: (2025)
Sparse Autoencoders Make Audio Foundation Models more Explainable
by: Mariotte, Théo, et al.
Published: (2025)
by: Mariotte, Théo, et al.
Published: (2025)
HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection
by: Nejad, Mahsa Ghazvini, et al.
Published: (2025)
by: Nejad, Mahsa Ghazvini, et al.
Published: (2025)
ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
by: Zhang, Yucong, et al.
Published: (2025)
by: Zhang, Yucong, et al.
Published: (2025)
When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems
by: Chondhekar, Sujal, et al.
Published: (2025)
by: Chondhekar, Sujal, et al.
Published: (2025)
Vocal Tract Length Warped Features for Spoken Keyword Spotting
by: Sarkar, Achintya kr., et al.
Published: (2025)
by: Sarkar, Achintya kr., et al.
Published: (2025)
D3RM: A Discrete Denoising Diffusion Refinement Model for Piano Transcription
by: Kim, Hounsu, et al.
Published: (2025)
by: Kim, Hounsu, et al.
Published: (2025)
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
by: Chung, Yoonjin, et al.
Published: (2025)
by: Chung, Yoonjin, et al.
Published: (2025)
Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data
by: Nihal, Ragib Amin, et al.
Published: (2025)
by: Nihal, Ragib Amin, et al.
Published: (2025)
Training-Free Multi-Step Audio Source Separation
by: Zang, Yongyi, et al.
Published: (2025)
by: Zang, Yongyi, et al.
Published: (2025)
CAtCh: Cognitive Assessment through Cookie Thief
by: Colonel, Joseph T, et al.
Published: (2025)
by: Colonel, Joseph T, et al.
Published: (2025)
MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge
by: Yang, Zijiang, et al.
Published: (2025)
by: Yang, Zijiang, et al.
Published: (2025)
Similar Items
-
Symbotunes: unified hub for symbolic music generative models
by: Skierś, Paweł, et al.
Published: (2024) -
Supervised contrastive learning from weakly-labeled audio segments for musical version matching
by: Serrà, Joan, et al.
Published: (2025) -
Recomposer: Event-roll-guided generative audio editing
by: Ellis, Daniel P. W., et al.
Published: (2025) -
Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
by: Torres, Bernardo, et al.
Published: (2025) -
Long-form music generation with latent diffusion
by: Evans, Zach, et al.
Published: (2024)