:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chauhan, Shivam, Pundhir, Ajay
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.10503
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025)

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)

Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
by: Meng, Yigang
Published: (2024)

Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025)

Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
by: Mehta, Atharva, et al.
Published: (2024)

Exploring Adapter Design Tradeoffs for Low Resource Music Generation
by: Mehta, Atharva, et al.
Published: (2025)

dMel: Speech Tokenization made Simple
by: Bai, Richard He, et al.
Published: (2024)

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization
by: Zhu, Haina, et al.
Published: (2025)

A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis
by: Hu, Guoqiang, et al.
Published: (2024)

CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
by: Kanatas, Angelos-Nikolaos, et al.
Published: (2025)

Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition
by: Feng, Sheng, et al.
Published: (2025)

Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms
by: Wang, Heehwan, et al.
Published: (2024)

CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
by: Zhao, Junchuan, et al.
Published: (2025)

Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
by: Bjare, Mathias Rose, et al.
Published: (2025)

Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization
by: Lin, Tzu-Quan, et al.
Published: (2025)

WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
by: Emon, Jakaria Islam, et al.
Published: (2025)

Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)

Layer-wise Investigation of Large-Scale Self-Supervised Music Representation Models
by: Zhou, Yizhi, et al.
Published: (2025)

AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
by: Sajid, M., et al.
Published: (2025)

Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient
by: Sebastian, Rinku, et al.
Published: (2025)

Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation
by: Liu, Jiafeng, et al.
Published: (2026)

Myna: Masking-Based Contrastive Learning of Musical Representations
by: Yonay, Ori, et al.
Published: (2025)

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata
by: Roy, Abhinaba, et al.
Published: (2025)

MusicSynth: An Automated Pipeline for Generating Violin Fingerboard Animations from Sheet Music Using Optical Music Recognition
by: Kaushik, Abhimanyu
Published: (2026)

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
by: Han, Yichen, et al.
Published: (2025)

Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
by: Sun, Yujia, et al.
Published: (2024)

Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
by: Guo, Xiaoxuan, et al.
Published: (2026)

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
by: Li, Shuyu, et al.
Published: (2025)

Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
by: Choudhary, Yash, et al.
Published: (2025)

HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026)

Music Arena: Live Evaluation for Text-to-Music
by: Kim, Yonghyun, et al.
Published: (2025)

Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database
by: Xiao, Qing, et al.
Published: (2025)

CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation
by: Hu, Zhejing, et al.
Published: (2025)

MuseCPBench: an Empirical Study of Music Editing Methods through Music Context Preservation
by: Vishe, Yash, et al.
Published: (2025)

SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis
by: Staněk, Vojtěch, et al.
Published: (2025)

Device-Guided Music Transfer
by: Hung, Manh Pham, et al.
Published: (2025)

MusicSwarm: Biologically Inspired Intelligence for Music Composition
by: Buehler, Markus J.
Published: (2025)

Modeling Music as a Time-Frequency Image: A 2D Tokenizer for Music Generation
by: Cheng, Yuqing, et al.
Published: (2026)