Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yuan, Shenghua, Tang, Xing, Chen, Jiatao, Xie, Tianming, Wang, Jing, Shi, Bing
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2507.20128
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917306943143936
author	Yuan, Shenghua Tang, Xing Chen, Jiatao Xie, Tianming Wang, Jing Shi, Bing
author_facet	Yuan, Shenghua Tang, Xing Chen, Jiatao Xie, Tianming Wang, Jing Shi, Bing
contents	Recent advancements in diffusion models have significantly improved symbolic music generation. However, most approaches rely on transformer-based architectures with self-attention mechanisms, which are constrained by quadratic computational complexity, limiting scalability for long sequences. To address this, we propose Symbolic Music Diffusion with Mamba (SMDIM), a novel diffusion-based architecture integrating Structured State Space Models (SSMs) for efficient global context modeling and the Mamba-FeedForward-Attention Block (MFA) for precise local detail preservation. The MFA Block combines the linear complexity of Mamba layers, the non-linear refinement of FeedForward layers, and the fine-grained precision of self-attention mechanisms, achieving a balance between scalability and musical expressiveness. SMDIM achieves near-linear complexity, making it highly efficient for long-sequence tasks. Evaluated on diverse datasets, including FolkDB, a collection of traditional Chinese folk music that represents an underexplored domain in symbolic music generation, SMDIM outperforms state-of-the-art models in both generation quality and computational efficiency. Beyond symbolic music, SMDIM's architectural design demonstrates adaptability to a broad range of long-sequence generation tasks, offering a scalable and efficient solution for coherent sequence modeling.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_20128
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Diffusion-based Symbolic Music Generation with Structured State Space Models Yuan, Shenghua Tang, Xing Chen, Jiatao Xie, Tianming Wang, Jing Shi, Bing Sound Recent advancements in diffusion models have significantly improved symbolic music generation. However, most approaches rely on transformer-based architectures with self-attention mechanisms, which are constrained by quadratic computational complexity, limiting scalability for long sequences. To address this, we propose Symbolic Music Diffusion with Mamba (SMDIM), a novel diffusion-based architecture integrating Structured State Space Models (SSMs) for efficient global context modeling and the Mamba-FeedForward-Attention Block (MFA) for precise local detail preservation. The MFA Block combines the linear complexity of Mamba layers, the non-linear refinement of FeedForward layers, and the fine-grained precision of self-attention mechanisms, achieving a balance between scalability and musical expressiveness. SMDIM achieves near-linear complexity, making it highly efficient for long-sequence tasks. Evaluated on diverse datasets, including FolkDB, a collection of traditional Chinese folk music that represents an underexplored domain in symbolic music generation, SMDIM outperforms state-of-the-art models in both generation quality and computational efficiency. Beyond symbolic music, SMDIM's architectural design demonstrates adaptability to a broad range of long-sequence generation tasks, offering a scalable and efficient solution for coherent sequence modeling.
title	Diffusion-based Symbolic Music Generation with Structured State Space Models
topic	Sound
url	https://arxiv.org/abs/2507.20128

Similar Items