:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bhattacharjee, Aditya, Higgs, Ivan Meresman, Sandler, Mark, Benetos, Emmanouil
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Information Retrieval H.5.5; I.2.6
Online Access:	https://arxiv.org/abs/2506.14684
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)

Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching
by: Vosoughi, Ali, et al.
Published: (2025)

MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
by: Zhu, Di, et al.
Published: (2026)

Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)

Step-Audio-R1 Technical Report
by: Tian, Fei, et al.
Published: (2025)

Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)

Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds
by: Cauzinille, Jules, et al.
Published: (2025)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)

Listen to the Unexpected: Self-Supervised Surprise Detection for Efficient Viewport Prediction
by: Khah, Arman Nik, et al.
Published: (2026)

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)

Machine learning based animal emotion classification using audio signals
by: Slobodian, Mariia, et al.
Published: (2025)

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)

BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation
by: Park, Taesoo, et al.
Published: (2025)

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond
by: Richter-Powell, Jessie, et al.
Published: (2025)

Matcha-TTS: A fast TTS architecture with conditional flow matching
by: Mehta, Shivam, et al.
Published: (2023)

BMdataset: A Musicologically Curated LilyPond Dataset
by: Spanio, Matteo, et al.
Published: (2026)

Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)

Automatic Album Sequencing
by: Herrmann, Vincent, et al.
Published: (2024)

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
by: Husain, Jaavid Aktar, et al.
Published: (2026)

Reciprocal Latent Fields for Precomputed Sound Propagation
by: Seuté, Hugo, et al.
Published: (2026)

Masked Contrastive Pre-Training Improves Music Audio Key Detection
by: Yonay, Ori, et al.
Published: (2026)

SoundPlot: An Open-Source Framework for Birdsong Acoustic Analysis and Neural Synthesis with Interactive 3D Visualization
by: Mehdi, Naqcho Ali, et al.
Published: (2026)

The evolution of inharmonicity and noisiness in contemporary popular music
by: Deruty, Emmanuel, et al.
Published: (2024)

Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model
by: Wang, Zihao, et al.
Published: (2025)

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps
by: Qian, Lekai, et al.
Published: (2026)

The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS
by: Chen, Kuan-Yu, et al.
Published: (2026)

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)

Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)

A Survey on World Models Grounded in Acoustic Physical Information
by: Chen, Xiaoliang, et al.
Published: (2025)

Dichotic harmony for the musical practice
by: Madgazin, Vadim R.
Published: (2010)

Benchmarking Sub-Genre Classification For Mainstage Dance Music
by: Shu, Hongzhi, et al.
Published: (2024)

Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025)

Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025)

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)