:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Raj, Vishnu, KV, Gouthaman, Gehlot, Shiv, Villemoes, Lars, Biswas, Arijit
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2509.21463
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances
by: Biswas, Arijit, et al.
Published: (2025)

RF-GML: Reference-Free Generative Machine Listener
by: Biswas, Arijit, et al.
Published: (2024)

Audio Decoding by Inverse Problem Solving
by: T., Pedro J. Villasana, et al.
Published: (2024)

Distribution Preserving Source Separation With Time Frequency Predictive Models
by: T., Pedro J. Villasana, et al.
Published: (2023)

Thinking While Listening: Simple Test Time Scaling For Audio Classification
by: Verma, Prateek, et al.
Published: (2025)

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)

Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
by: Di Carlo, Diego, et al.
Published: (2025)

SwiftF0: Fast and Accurate Monophonic Pitch Detection
by: Nieradzik, Lars
Published: (2025)

GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model
by: Li, Haoyang, et al.
Published: (2025)

Steering Autoregressive Music Generation with Recursive Feature Machines
by: Zhao, Daniel, et al.
Published: (2025)

Semi-Supervised Contrastive Learning for Controllable Video-to-Music Retrieval
by: Stewart, Shanti, et al.
Published: (2024)

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
by: Liao, Shijia, et al.
Published: (2024)

DeepEmoNet: Building Machine Learning Models for Automatic Emotion Recognition in Human Speeches
by: Vu, Tai
Published: (2025)

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
by: Hussain, Shehzeen, et al.
Published: (2025)

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
by: Zhang, Junan, et al.
Published: (2025)

Aligning Generative Speech Enhancement with Perceptual Feedback
by: Li, Haoyang, et al.
Published: (2025)

Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation
by: Pettenó, Matteo, et al.
Published: (2025)

Efficient Parallel Audio Generation using Group Masked Language Modeling
by: Jeong, Myeonghun, et al.
Published: (2024)

Re-ENACT: Reinforcement Learning for Emotional Speech Generation using Actor-Critic Strategy
by: Shankar, Ravi, et al.
Published: (2024)

Machine listening in a neonatal intensive care unit
by: Tailleur, Modan, et al.
Published: (2024)

GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting
by: Zhu, Pai, et al.
Published: (2024)

On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation
by: Pettenó, Matteo, et al.
Published: (2025)

Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
by: Hu, Yuchen, et al.
Published: (2024)

Machine Learning Techniques in Automatic Music Transcription: A Systematic Survey
by: Jamshidi, Fatemeh, et al.
Published: (2024)

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature
by: Vrba, Jan, et al.
Published: (2024)

HEAR: Holistic Evaluation of Audio Representations
by: Turian, Joseph, et al.
Published: (2022)

Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets
by: Anidjar, Or Haim, et al.
Published: (2025)

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
by: Alex, Tony, et al.
Published: (2025)

Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture
by: Singh, Karamvir
Published: (2025)

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
by: Chen, Yang, et al.
Published: (2024)

Simple and Controllable Music Generation
by: Copet, Jade, et al.
Published: (2023)

From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training
by: Yao, Mingyang, et al.
Published: (2025)

A Survey of Music Generation in the Context of Interaction
by: Agchar, Ismael, et al.
Published: (2024)

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers
by: Işık, Atakan, et al.
Published: (2025)

Enhanced Speech Emotion Recognition with Efficient Channel Attention Guided Deep CNN-BiLSTM Framework
by: Kundu, Niloy Kumar, et al.
Published: (2024)

Multi-Metric Preference Alignment for Generative Speech Restoration
by: Zhang, Junan, et al.
Published: (2025)

DITTO: Diffusion Inference-Time T-Optimization for Music Generation
by: Novack, Zachary, et al.
Published: (2024)

Presto! Distilling Steps and Layers for Accelerating Music Generation
by: Novack, Zachary, et al.
Published: (2024)

Single and Few-step Diffusion for Generative Speech Enhancement
by: Lay, Bunlong, et al.
Published: (2023)

A Survey of Deep Learning Audio Generation Methods
by: Božić, Matej, et al.
Published: (2024)