:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Choudhary, Yash, Rao, Preeti, Bhattacharyya, Pushpak
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2512.06259
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction
by: Choudhary, Yash, et al.
Published: (2025)

Guitar Chord Diagram Suggestion for Western Popular Music
by: d'Hooge, Alexandre, et al.
Published: (2024)

MusicLIME: Explainable Multimodal Music Understanding
by: Sotirou, Theodoros, et al.
Published: (2024)

GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models
by: You, Zuyao, et al.
Published: (2026)

Survey on the Evaluation of Generative Models in Music
by: Lerch, Alexander, et al.
Published: (2025)

Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling
by: Yi, Yungang, et al.
Published: (2026)

Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music
by: Bhake, Yash, et al.
Published: (2025)

CSyMR: Benchmarking Compositional Music Information Retrieval in Symbolic Music Reasoning
by: Wang, Boyang, et al.
Published: (2025)

MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts
by: Lou, Yuxuan, et al.
Published: (2026)

MuseCPBench: an Empirical Study of Music Editing Methods through Music Context Preservation
by: Vishe, Yash, et al.
Published: (2025)

Multimodal Audio-based Disease Prediction with Transformer-based Hierarchical Fusion Network
by: Cai, Jinjin, et al.
Published: (2024)

ConceptCaps: a Distilled Concept Dataset for Interpretability in Music Models
by: Sienkiewicz, Bruno, et al.
Published: (2026)

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
by: Novack, Zachary, et al.
Published: (2026)

Memo2496: Expert-Annotated Dataset and Dual-View Adaptive Framework for Music Emotion Recognition
by: Li, Qilin, et al.
Published: (2025)

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
by: Zhang, Yixiao, et al.
Published: (2024)

Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
by: Bradshaw, Louis, et al.
Published: (2025)

LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning
by: Dhar, Sandipan, et al.
Published: (2025)

Timing Matters: Enhancing User Experience through Temporal Prediction in Smart Homes
by: Ganatra, Shrey, et al.
Published: (2024)

Modality-Invariant Bidirectional Temporal Representation Distillation Network for Missing Multimodal Sentiment Analysis
by: Wang, Xincheng, et al.
Published: (2025)

Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework
by: Kim, Yunsik, et al.
Published: (2025)

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
by: Ma, Yinghao, et al.
Published: (2026)

Myna: Masking-Based Contrastive Learning of Musical Representations
by: Yonay, Ori, et al.
Published: (2025)

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
by: Li, Shuyu, et al.
Published: (2025)

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
by: Wu, Junda, et al.
Published: (2024)

Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025)

DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
by: Novack, Zachary, et al.
Published: (2024)

Fusion Segment Transformer: Bi-Directional Attention Guided Fusion Network for AI-Generated Music Detection
by: Kim, Yumin, et al.
Published: (2026)

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
by: Tan, Jiaye, et al.
Published: (2025)

Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation
by: Agarwal, Manvi, et al.
Published: (2025)

WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
by: Emon, Jakaria Islam, et al.
Published: (2025)

Fusing Memory and Attention: A study on LSTM, Transformer and Hybrid Architectures for Symbolic Music Generation
by: Ghoshal, Soudeep, et al.
Published: (2026)

Do Music Generation Models Encode Music Theory?
by: Wei, Megan, et al.
Published: (2024)

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation
by: Fan, Pingyi, et al.
Published: (2025)

An Independence-promoting Loss for Music Generation with Language Models
by: Lemercier, Jean-Marie, et al.
Published: (2024)

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation
by: Prokopiou, Ioannis, et al.
Published: (2026)

TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
by: Pegg, Samuel, et al.
Published: (2024)

Music Source Restoration
by: Zang, Yongyi, et al.
Published: (2025)

AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
by: Huang, Yu-Fen, et al.
Published: (2024)

CloserMusicDB: A Modern Multipurpose Dataset of High Quality Music
by: Piekarzewicz, Aleksandra, et al.
Published: (2024)