Saved in:
| Main Authors: | Chauhan, Shivam, Pundhir, Ajay |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.10503 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025)
by: Mehta, Atharva, et al.
Published: (2025)
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)
by: Shao, Nian, et al.
Published: (2025)
Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
by: Meng, Yigang
Published: (2024)
by: Meng, Yigang
Published: (2024)
Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025)
by: Penumajji, Niketa
Published: (2025)
Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
by: Mehta, Atharva, et al.
Published: (2024)
by: Mehta, Atharva, et al.
Published: (2024)
Exploring Adapter Design Tradeoffs for Low Resource Music Generation
by: Mehta, Atharva, et al.
Published: (2025)
by: Mehta, Atharva, et al.
Published: (2025)
dMel: Speech Tokenization made Simple
by: Bai, Richard He, et al.
Published: (2024)
by: Bai, Richard He, et al.
Published: (2024)
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization
by: Zhu, Haina, et al.
Published: (2025)
by: Zhu, Haina, et al.
Published: (2025)
A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis
by: Hu, Guoqiang, et al.
Published: (2024)
by: Hu, Guoqiang, et al.
Published: (2024)
CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning
by: Kanatas, Angelos-Nikolaos, et al.
Published: (2025)
by: Kanatas, Angelos-Nikolaos, et al.
Published: (2025)
Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition
by: Feng, Sheng, et al.
Published: (2025)
by: Feng, Sheng, et al.
Published: (2025)
Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms
by: Wang, Heehwan, et al.
Published: (2024)
by: Wang, Heehwan, et al.
Published: (2024)
CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
by: Bjare, Mathias Rose, et al.
Published: (2025)
by: Bjare, Mathias Rose, et al.
Published: (2025)
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
by: Emon, Jakaria Islam, et al.
Published: (2025)
by: Emon, Jakaria Islam, et al.
Published: (2025)
Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2025)
AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
by: Qiu, Jielin, et al.
Published: (2026)
by: Qiu, Jielin, et al.
Published: (2026)
Layer-wise Investigation of Large-Scale Self-Supervised Music Representation Models
by: Zhou, Yizhi, et al.
Published: (2025)
by: Zhou, Yizhi, et al.
Published: (2025)
AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
by: Sajid, M., et al.
Published: (2025)
by: Sajid, M., et al.
Published: (2025)
Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient
by: Sebastian, Rinku, et al.
Published: (2025)
by: Sebastian, Rinku, et al.
Published: (2025)
Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation
by: Liu, Jiafeng, et al.
Published: (2026)
by: Liu, Jiafeng, et al.
Published: (2026)
Myna: Masking-Based Contrastive Learning of Musical Representations
by: Yonay, Ori, et al.
Published: (2025)
by: Yonay, Ori, et al.
Published: (2025)
JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata
by: Roy, Abhinaba, et al.
Published: (2025)
by: Roy, Abhinaba, et al.
Published: (2025)
MusicSynth: An Automated Pipeline for Generating Violin Fingerboard Animations from Sheet Music Using Optical Music Recognition
by: Kaushik, Abhimanyu
Published: (2026)
by: Kaushik, Abhimanyu
Published: (2026)
Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
by: Han, Yichen, et al.
Published: (2025)
by: Han, Yichen, et al.
Published: (2025)
Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
by: Sun, Yujia, et al.
Published: (2024)
by: Sun, Yujia, et al.
Published: (2024)
Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
by: Guo, Xiaoxuan, et al.
Published: (2026)
by: Guo, Xiaoxuan, et al.
Published: (2026)
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)
by: Huang, Yu-Fen, et al.
Published: (2024)
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
by: Li, Shuyu, et al.
Published: (2025)
by: Li, Shuyu, et al.
Published: (2025)
Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
by: Choudhary, Yash, et al.
Published: (2025)
by: Choudhary, Yash, et al.
Published: (2025)
HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
by: Go, Seonghyeon, et al.
Published: (2026)
by: Go, Seonghyeon, et al.
Published: (2026)
Music Arena: Live Evaluation for Text-to-Music
by: Kim, Yonghyun, et al.
Published: (2025)
by: Kim, Yonghyun, et al.
Published: (2025)
Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database
by: Xiao, Qing, et al.
Published: (2025)
by: Xiao, Qing, et al.
Published: (2025)
CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation
by: Hu, Zhejing, et al.
Published: (2025)
by: Hu, Zhejing, et al.
Published: (2025)
MuseCPBench: an Empirical Study of Music Editing Methods through Music Context Preservation
by: Vishe, Yash, et al.
Published: (2025)
by: Vishe, Yash, et al.
Published: (2025)
SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis
by: Staněk, Vojtěch, et al.
Published: (2025)
by: Staněk, Vojtěch, et al.
Published: (2025)
Device-Guided Music Transfer
by: Hung, Manh Pham, et al.
Published: (2025)
by: Hung, Manh Pham, et al.
Published: (2025)
MusicSwarm: Biologically Inspired Intelligence for Music Composition
by: Buehler, Markus J.
Published: (2025)
by: Buehler, Markus J.
Published: (2025)
Modeling Music as a Time-Frequency Image: A 2D Tokenizer for Music Generation
by: Cheng, Yuqing, et al.
Published: (2026)
by: Cheng, Yuqing, et al.
Published: (2026)
Similar Items
-
Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025) -
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025) -
Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms
by: Meng, Yigang
Published: (2024) -
Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
by: Penumajji, Niketa
Published: (2025) -
Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
by: Mehta, Atharva, et al.
Published: (2024)