:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Xuanchen, Wang, Heng, Cai, Weidong
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Multimedia
Online Access:	https://arxiv.org/abs/2510.13244
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion
by: Wang, Xuanchen, et al.
Published: (2025)

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
by: Wang, Xuanchen, et al.
Published: (2024)

MusicWeaver: Composer-Style Structural Editing and Minute-Scale Coherent Music Generation
by: Wang, Xuanchen, et al.
Published: (2025)

Music Arena: Live Evaluation for Text-to-Music
by: Kim, Yonghyun, et al.
Published: (2025)

MusicSwarm: Biologically Inspired Intelligence for Music Composition
by: Buehler, Markus J.
Published: (2025)

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
by: Li, Shuyu, et al.
Published: (2025)

Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
by: Xia, Haiying, et al.
Published: (2025)

Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
by: Li, Xiaojie, et al.
Published: (2025)

Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music
by: Su, Hongju, et al.
Published: (2025)

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
by: Chen, Yanxi, et al.
Published: (2025)

EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation
by: Izzati, Fathinah, et al.
Published: (2025)

GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment
by: Wang, Jinting, et al.
Published: (2025)

Memo2496: Expert-Annotated Dataset and Dual-View Adaptive Framework for Music Emotion Recognition
by: Li, Qilin, et al.
Published: (2025)

Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
by: Mehta, Atharva, et al.
Published: (2025)

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
by: Zhang, Yixiao, et al.
Published: (2024)

MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core
by: Liao, Callie C., et al.
Published: (2025)

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
by: Gan, Qijun, et al.
Published: (2024)

Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
by: Novack, Zachary, et al.
Published: (2026)

Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
by: Sun, Yujia, et al.
Published: (2024)

Emotion-Aligned Contrastive Learning Between Images and Music
by: Stewart, Shanti, et al.
Published: (2023)

A Survey of Foundation Models for Music Understanding
by: Li, Wenjun, et al.
Published: (2024)

AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
by: Sajid, M., et al.
Published: (2025)

Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
by: Shen, Nanhan, et al.
Published: (2026)

Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
by: Retkowski, Jan, et al.
Published: (2024)

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio
by: Alonso-Jiménez, Pablo, et al.
Published: (2024)

MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
by: Ji, Shulei, et al.
Published: (2023)

MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
by: Choi, Suhwan, et al.
Published: (2025)

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio
by: Batlle-Roca, Roser, et al.
Published: (2024)

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence
by: Ma, Menghe, et al.
Published: (2026)

Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
by: Fan, Congyi, et al.
Published: (2025)

Cross-Modal Learning for Music-to-Music-Video Description Generation
by: Mao, Zhuoyuan, et al.
Published: (2025)

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction
by: Zhang, Zhisheng, et al.
Published: (2026)

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
by: Zhang, Yixiao, et al.
Published: (2024)

ReactMotion: Generating Reactive Listener Motions from Speaker Utterance
by: Luo, Cheng, et al.
Published: (2026)

Hierarchical Semantic Correlation-Aware Masked Autoencoder for Unsupervised Audio-Visual Representation Learning
by: Zeng, Donghuo, et al.
Published: (2026)

YuE: Scaling Open Foundation Models for Long-Form Music Generation
by: Yuan, Ruibin, et al.
Published: (2025)

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
by: You, Fuming, et al.
Published: (2024)

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs
by: Desai, Shail, et al.
Published: (2025)

Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
by: Zhang, Hailong, et al.
Published: (2025)

The Name-Free Gap: Policy-Aware Stylistic Control in Music Generation
by: Nagarajan, Ashwin, et al.
Published: (2025)