Saved in:
| Main Authors: | Bhattacharjee, Aditya, Higgs, Ivan Meresman, Sandler, Mark, Benetos, Emmanouil |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.14684 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)
by: Bhattacharjee, Aditya, et al.
Published: (2024)
Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)
by: Bhattacharjee, Aditya, et al.
Published: (2025)
PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching
by: Vosoughi, Ali, et al.
Published: (2025)
by: Vosoughi, Ali, et al.
Published: (2025)
MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
by: Zhu, Di, et al.
Published: (2026)
by: Zhu, Di, et al.
Published: (2026)
Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)
by: Agarwal, Rohan
Published: (2025)
Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)
by: Dhiman, Jai
Published: (2026)
Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)
by: Aristorenas, Aris J.
Published: (2024)
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
Step-Audio-R1 Technical Report
by: Tian, Fei, et al.
Published: (2025)
by: Tian, Fei, et al.
Published: (2025)
Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)
by: Zhou-Zheng, Christian, et al.
Published: (2025)
Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds
by: Cauzinille, Jules, et al.
Published: (2025)
by: Cauzinille, Jules, et al.
Published: (2025)
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)
by: Tsangko, Iosif, et al.
Published: (2025)
Listen to the Unexpected: Self-Supervised Surprise Detection for Efficient Viewport Prediction
by: Khah, Arman Nik, et al.
Published: (2026)
by: Khah, Arman Nik, et al.
Published: (2026)
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)
by: Mehta, Shivam, et al.
Published: (2024)
Machine learning based animal emotion classification using audio signals
by: Slobodian, Mariia, et al.
Published: (2025)
by: Slobodian, Mariia, et al.
Published: (2025)
HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)
by: Khushiyant, et al.
Published: (2026)
BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation
by: Park, Taesoo, et al.
Published: (2025)
by: Park, Taesoo, et al.
Published: (2025)
Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond
by: Richter-Powell, Jessie, et al.
Published: (2025)
by: Richter-Powell, Jessie, et al.
Published: (2025)
Matcha-TTS: A fast TTS architecture with conditional flow matching
by: Mehta, Shivam, et al.
Published: (2023)
by: Mehta, Shivam, et al.
Published: (2023)
BMdataset: A Musicologically Curated LilyPond Dataset
by: Spanio, Matteo, et al.
Published: (2026)
by: Spanio, Matteo, et al.
Published: (2026)
Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)
by: Wei, Megan, et al.
Published: (2024)
Automatic Album Sequencing
by: Herrmann, Vincent, et al.
Published: (2024)
by: Herrmann, Vincent, et al.
Published: (2024)
APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
by: Husain, Jaavid Aktar, et al.
Published: (2026)
by: Husain, Jaavid Aktar, et al.
Published: (2026)
Reciprocal Latent Fields for Precomputed Sound Propagation
by: Seuté, Hugo, et al.
Published: (2026)
by: Seuté, Hugo, et al.
Published: (2026)
Masked Contrastive Pre-Training Improves Music Audio Key Detection
by: Yonay, Ori, et al.
Published: (2026)
by: Yonay, Ori, et al.
Published: (2026)
SoundPlot: An Open-Source Framework for Birdsong Acoustic Analysis and Neural Synthesis with Interactive 3D Visualization
by: Mehdi, Naqcho Ali, et al.
Published: (2026)
by: Mehdi, Naqcho Ali, et al.
Published: (2026)
The evolution of inharmonicity and noisiness in contemporary popular music
by: Deruty, Emmanuel, et al.
Published: (2024)
by: Deruty, Emmanuel, et al.
Published: (2024)
Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model
by: Wang, Zihao, et al.
Published: (2025)
by: Wang, Zihao, et al.
Published: (2025)
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps
by: Qian, Lekai, et al.
Published: (2026)
by: Qian, Lekai, et al.
Published: (2026)
The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS
by: Chen, Kuan-Yu, et al.
Published: (2026)
by: Chen, Kuan-Yu, et al.
Published: (2026)
ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)
by: Jahanbin, Peyman
Published: (2025)
A Survey on World Models Grounded in Acoustic Physical Information
by: Chen, Xiaoliang, et al.
Published: (2025)
by: Chen, Xiaoliang, et al.
Published: (2025)
Dichotic harmony for the musical practice
by: Madgazin, Vadim R.
Published: (2010)
by: Madgazin, Vadim R.
Published: (2010)
Benchmarking Sub-Genre Classification For Mainstage Dance Music
by: Shu, Hongzhi, et al.
Published: (2024)
by: Shu, Hongzhi, et al.
Published: (2024)
Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025)
by: Yuan, Weixuan, et al.
Published: (2025)
Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025)
by: Wang, Shaowen, et al.
Published: (2025)
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)
by: Li, Pengcheng, et al.
Published: (2024)
Similar Items
-
GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024) -
Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025) -
PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching
by: Vosoughi, Ali, et al.
Published: (2025) -
MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
by: Zhu, Di, et al.
Published: (2026) -
Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)