Saved in:
| Main Authors: | Nagarajan, Ashwin, Dong, Hao-Wen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.00654 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generative AI for Music and Audio
by: Dong, Hao-Wen
Published: (2024)
by: Dong, Hao-Wen
Published: (2024)
Segment-Factorized Full-Song Generation on Symbolic Piano Music
by: Chen, Ping-Yi, et al.
Published: (2025)
by: Chen, Ping-Yi, et al.
Published: (2025)
MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
by: Tan, Hao Hao, et al.
Published: (2024)
by: Tan, Hao Hao, et al.
Published: (2024)
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
by: Zhang, Yixiao, et al.
Published: (2024)
by: Zhang, Yixiao, et al.
Published: (2024)
Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
by: Zhu, Tingyu, et al.
Published: (2024)
by: Zhu, Tingyu, et al.
Published: (2024)
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
by: Li, Peike, et al.
Published: (2023)
by: Li, Peike, et al.
Published: (2023)
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
by: Long, Phillip, et al.
Published: (2024)
by: Long, Phillip, et al.
Published: (2024)
LM2D: Lyrics- and Music-Driven Dance Synthesis
by: Yin, Wenjie, et al.
Published: (2024)
by: Yin, Wenjie, et al.
Published: (2024)
Automatic Music Transcription using Convolutional Neural Networks and Constant-Q transform
by: Telila, Yohannis, et al.
Published: (2025)
by: Telila, Yohannis, et al.
Published: (2025)
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
by: Ma, Yinghao, et al.
Published: (2026)
by: Ma, Yinghao, et al.
Published: (2026)
Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge
by: Shao, Keren, et al.
Published: (2024)
by: Shao, Keren, et al.
Published: (2024)
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
by: Boudaghi, Ali, et al.
Published: (2025)
by: Boudaghi, Ali, et al.
Published: (2025)
Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)
by: Wu, Linzhi, et al.
Published: (2026)
Controllable Video-to-Music Generation with Multiple Time-Varying Conditions
by: Wu, Junxian, et al.
Published: (2025)
by: Wu, Junxian, et al.
Published: (2025)
ChatMusician: Understanding and Generating Music Intrinsically with LLM
by: Yuan, Ruibin, et al.
Published: (2024)
by: Yuan, Ruibin, et al.
Published: (2024)
DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
by: Fujita, Yoto, et al.
Published: (2024)
by: Fujita, Yoto, et al.
Published: (2024)
Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
by: Retkowski, Jan, et al.
Published: (2024)
by: Retkowski, Jan, et al.
Published: (2024)
Exploring Adapter Design Tradeoffs for Low Resource Music Generation
by: Mehta, Atharva, et al.
Published: (2025)
by: Mehta, Atharva, et al.
Published: (2025)
GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment
by: Wang, Jinting, et al.
Published: (2025)
by: Wang, Jinting, et al.
Published: (2025)
MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
by: Ji, Shulei, et al.
Published: (2023)
by: Ji, Shulei, et al.
Published: (2023)
Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio
by: Batlle-Roca, Roser, et al.
Published: (2024)
by: Batlle-Roca, Roser, et al.
Published: (2024)
Fast Text-to-Audio Generation with Adversarial Post-Training
by: Novack, Zachary, et al.
Published: (2025)
by: Novack, Zachary, et al.
Published: (2025)
Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation
by: Lee, Junwon, et al.
Published: (2024)
by: Lee, Junwon, et al.
Published: (2024)
FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation
by: Jiang, Yuxuan, et al.
Published: (2025)
by: Jiang, Yuxuan, et al.
Published: (2025)
MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
by: Pasquier, Philippe, et al.
Published: (2025)
by: Pasquier, Philippe, et al.
Published: (2025)
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
by: Zuo, Heda, et al.
Published: (2025)
by: Zuo, Heda, et al.
Published: (2025)
A Survey of Foundation Models for Music Understanding
by: Li, Wenjun, et al.
Published: (2024)
by: Li, Wenjun, et al.
Published: (2024)
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
by: Cheng, Hao, et al.
Published: (2025)
by: Cheng, Hao, et al.
Published: (2025)
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
by: Du, Zhihao, et al.
Published: (2023)
by: Du, Zhihao, et al.
Published: (2023)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
by: Deng, Qixin, et al.
Published: (2024)
by: Deng, Qixin, et al.
Published: (2024)
YuE: Scaling Open Foundation Models for Long-Form Music Generation
by: Yuan, Ruibin, et al.
Published: (2025)
by: Yuan, Ruibin, et al.
Published: (2025)
Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach
by: Zhao, Zijian, et al.
Published: (2025)
by: Zhao, Zijian, et al.
Published: (2025)
Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder
by: Luo, Jing, et al.
Published: (2025)
by: Luo, Jing, et al.
Published: (2025)
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
by: Zhang, Yixiao, et al.
Published: (2024)
by: Zhang, Yixiao, et al.
Published: (2024)
MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
by: Choi, Suhwan, et al.
Published: (2025)
by: Choi, Suhwan, et al.
Published: (2025)
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
by: Lam, Max W. Y., et al.
Published: (2025)
by: Lam, Max W. Y., et al.
Published: (2025)
BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features
by: Luo, Jing, et al.
Published: (2024)
by: Luo, Jing, et al.
Published: (2024)
From Discord to Harmony: Decomposed Consonance-based Training for Improved Audio Chord Estimation
by: Poltronieri, Andrea, et al.
Published: (2025)
by: Poltronieri, Andrea, et al.
Published: (2025)
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
by: Shao, Keren, et al.
Published: (2025)
by: Shao, Keren, et al.
Published: (2025)
Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
Similar Items
-
Generative AI for Music and Audio
by: Dong, Hao-Wen
Published: (2024) -
Segment-Factorized Full-Song Generation on Symbolic Piano Music
by: Chen, Ping-Yi, et al.
Published: (2025) -
MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
by: Tan, Hao Hao, et al.
Published: (2024) -
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
by: Zhang, Yixiao, et al.
Published: (2024) -
Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation
by: Zhu, Tingyu, et al.
Published: (2024)