:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Batra, Arnesh, Sharma, Dev, Thukral, Krish, Bhatia, Ruhani, Batra, Naman, Gautam, Aditya
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2512.00621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South
by: Mehta, Atharva, et al.
Published: (2024)

MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection
by: Lu, Tongyu, et al.
Published: (2025)

Exploring Machine Learning and Language Models for Multimodal Depression Detection
by: Hong, Javier Si Zhao, et al.
Published: (2025)

Melody-Guided Music Generation
by: Wei, Shaopeng, et al.
Published: (2024)

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
by: Ding, Shuangrui, et al.
Published: (2024)

MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core
by: Liao, Callie C., et al.
Published: (2025)

Story2MIDI: Emotionally Aligned Music Generation from Text
by: Shokri, Mohammad, et al.
Published: (2025)

EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation
by: Sajeer, Sahal, et al.
Published: (2026)

Generative Artificial Intelligence, Musical Heritage and the Construction of Peace Narratives: A Case Study in Mali
by: Coulibaly, Nouhoum, et al.
Published: (2026)

Cross-Modal Learning for Music-to-Music-Video Description Generation
by: Mao, Zhuoyuan, et al.
Published: (2025)

Linear Complexity Self-Supervised Learning for Music Understanding with Random Quantizer
by: Vavaroutsos, Petros, et al.
Published: (2026)

MindMelody: A Closed-Loop EEG-Driven System for Personalized Music Intervention
by: Zhang, Yimeng, et al.
Published: (2026)

Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025)

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
by: Zheng, Junjie, et al.
Published: (2025)

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
by: Yao, Yiqun, et al.
Published: (2025)

Musical ethnocentrism in Large Language Models
by: Kruspe, Anna
Published: (2025)

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
by: Zhang, Chong, et al.
Published: (2025)

Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
by: Manco, Ilaria, et al.
Published: (2024)

Synthetic Audio Helps for Cognitive State Tasks
by: Soubki, Adil, et al.
Published: (2025)

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
by: Papi, Sara, et al.
Published: (2024)

Audio Contrastive-based Fine-tuning: Decoupling Representation Learning and Classification
by: Wang, Yang, et al.
Published: (2023)

Sing it, Narrate it: Quality Musical Lyrics Translation
by: Ye, Zhuorui, et al.
Published: (2024)

AI-Generated Song Detection via Lyrics Transcripts
by: Frohmann, Markus, et al.
Published: (2025)

Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR
by: Sharma, Rishikesh Kumar, et al.
Published: (2026)

Do Music Generation Models Encode Music Theory?
by: Wei, Megan, et al.
Published: (2024)

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation
by: Della Libera, Luca, et al.
Published: (2026)

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
by: Roh, Jaechul, et al.
Published: (2025)

Text2midi: Generating Symbolic Music from Captions
by: Bhandari, Keshav, et al.
Published: (2024)

Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems
by: Kwok, Chin Yuen, et al.
Published: (2025)

Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR
by: Kulkarni, Ajinkya, et al.
Published: (2026)

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
by: Deng, Zihao, et al.
Published: (2023)

DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
by: Mao, Zhuoyuan, et al.
Published: (2025)

WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
by: Rao, Rajath, et al.
Published: (2025)

Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
by: Khaertdinov, Bulat, et al.
Published: (2024)

Controlling Surprisal in Music Generation via Information Content Curve Matching
by: Bjare, Mathias Rose, et al.
Published: (2024)

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints
by: Meng, Hao, et al.
Published: (2026)

PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation
by: He, Jiajun, et al.
Published: (2025)