Saved in:
| Main Authors: | Raj, Vishnu, KV, Gouthaman, Gehlot, Shiv, Villemoes, Lars, Biswas, Arijit |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.21463 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances
by: Biswas, Arijit, et al.
Published: (2025)
by: Biswas, Arijit, et al.
Published: (2025)
RF-GML: Reference-Free Generative Machine Listener
by: Biswas, Arijit, et al.
Published: (2024)
by: Biswas, Arijit, et al.
Published: (2024)
Audio Decoding by Inverse Problem Solving
by: T., Pedro J. Villasana, et al.
Published: (2024)
by: T., Pedro J. Villasana, et al.
Published: (2024)
Distribution Preserving Source Separation With Time Frequency Predictive Models
by: T., Pedro J. Villasana, et al.
Published: (2023)
by: T., Pedro J. Villasana, et al.
Published: (2023)
Thinking While Listening: Simple Test Time Scaling For Audio Classification
by: Verma, Prateek, et al.
Published: (2025)
by: Verma, Prateek, et al.
Published: (2025)
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)
by: Du, Zhihao, et al.
Published: (2023)
Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
by: Di Carlo, Diego, et al.
Published: (2025)
by: Di Carlo, Diego, et al.
Published: (2025)
SwiftF0: Fast and Accurate Monophonic Pitch Detection
by: Nieradzik, Lars
Published: (2025)
by: Nieradzik, Lars
Published: (2025)
GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Steering Autoregressive Music Generation with Recursive Feature Machines
by: Zhao, Daniel, et al.
Published: (2025)
by: Zhao, Daniel, et al.
Published: (2025)
Semi-Supervised Contrastive Learning for Controllable Video-to-Music Retrieval
by: Stewart, Shanti, et al.
Published: (2024)
by: Stewart, Shanti, et al.
Published: (2024)
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
by: Liao, Shijia, et al.
Published: (2024)
by: Liao, Shijia, et al.
Published: (2024)
DeepEmoNet: Building Machine Learning Models for Automatic Emotion Recognition in Human Speeches
by: Vu, Tai
Published: (2025)
by: Vu, Tai
Published: (2025)
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
by: Hussain, Shehzeen, et al.
Published: (2025)
by: Hussain, Shehzeen, et al.
Published: (2025)
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
by: Zhang, Junan, et al.
Published: (2025)
by: Zhang, Junan, et al.
Published: (2025)
Aligning Generative Speech Enhancement with Perceptual Feedback
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation
by: Pettenó, Matteo, et al.
Published: (2025)
by: Pettenó, Matteo, et al.
Published: (2025)
Efficient Parallel Audio Generation using Group Masked Language Modeling
by: Jeong, Myeonghun, et al.
Published: (2024)
by: Jeong, Myeonghun, et al.
Published: (2024)
Re-ENACT: Reinforcement Learning for Emotional Speech Generation using Actor-Critic Strategy
by: Shankar, Ravi, et al.
Published: (2024)
by: Shankar, Ravi, et al.
Published: (2024)
Machine listening in a neonatal intensive care unit
by: Tailleur, Modan, et al.
Published: (2024)
by: Tailleur, Modan, et al.
Published: (2024)
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting
by: Zhu, Pai, et al.
Published: (2024)
by: Zhu, Pai, et al.
Published: (2024)
On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation
by: Pettenó, Matteo, et al.
Published: (2025)
by: Pettenó, Matteo, et al.
Published: (2025)
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Machine Learning Techniques in Automatic Music Transcription: A Systematic Survey
by: Jamshidi, Fatemeh, et al.
Published: (2024)
by: Jamshidi, Fatemeh, et al.
Published: (2024)
Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature
by: Vrba, Jan, et al.
Published: (2024)
by: Vrba, Jan, et al.
Published: (2024)
HEAR: Holistic Evaluation of Audio Representations
by: Turian, Joseph, et al.
Published: (2022)
by: Turian, Joseph, et al.
Published: (2022)
Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets
by: Anidjar, Or Haim, et al.
Published: (2025)
by: Anidjar, Or Haim, et al.
Published: (2025)
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
by: Alex, Tony, et al.
Published: (2025)
by: Alex, Tony, et al.
Published: (2025)
Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture
by: Singh, Karamvir
Published: (2025)
by: Singh, Karamvir
Published: (2025)
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
Simple and Controllable Music Generation
by: Copet, Jade, et al.
Published: (2023)
by: Copet, Jade, et al.
Published: (2023)
From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training
by: Yao, Mingyang, et al.
Published: (2025)
by: Yao, Mingyang, et al.
Published: (2025)
A Survey of Music Generation in the Context of Interaction
by: Agchar, Ismael, et al.
Published: (2024)
by: Agchar, Ismael, et al.
Published: (2024)
Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers
by: Işık, Atakan, et al.
Published: (2025)
by: Işık, Atakan, et al.
Published: (2025)
Enhanced Speech Emotion Recognition with Efficient Channel Attention Guided Deep CNN-BiLSTM Framework
by: Kundu, Niloy Kumar, et al.
Published: (2024)
by: Kundu, Niloy Kumar, et al.
Published: (2024)
Multi-Metric Preference Alignment for Generative Speech Restoration
by: Zhang, Junan, et al.
Published: (2025)
by: Zhang, Junan, et al.
Published: (2025)
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
by: Novack, Zachary, et al.
Published: (2024)
by: Novack, Zachary, et al.
Published: (2024)
Presto! Distilling Steps and Layers for Accelerating Music Generation
by: Novack, Zachary, et al.
Published: (2024)
by: Novack, Zachary, et al.
Published: (2024)
Single and Few-step Diffusion for Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2023)
by: Lay, Bunlong, et al.
Published: (2023)
A Survey of Deep Learning Audio Generation Methods
by: Božić, Matej, et al.
Published: (2024)
by: Božić, Matej, et al.
Published: (2024)
Similar Items
-
Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances
by: Biswas, Arijit, et al.
Published: (2025) -
RF-GML: Reference-Free Generative Machine Listener
by: Biswas, Arijit, et al.
Published: (2024) -
Audio Decoding by Inverse Problem Solving
by: T., Pedro J. Villasana, et al.
Published: (2024) -
Distribution Preserving Source Separation With Time Frequency Predictive Models
by: T., Pedro J. Villasana, et al.
Published: (2023) -
Thinking While Listening: Simple Test Time Scaling For Audio Classification
by: Verma, Prateek, et al.
Published: (2025)