Saved in:
| Main Authors: | Li, Chang, Wang, Ruoyu, Liu, Lijuan, Du, Jun, Sun, Yixuan, Guo, Zilu, Zhang, Zhenrong, Jiang, Yuan, Gao, Jianqing, Ma, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.15863 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
by: Guo, Zilu, et al.
Published: (2023)
by: Guo, Zilu, et al.
Published: (2023)
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
by: Hou, Siyuan, et al.
Published: (2024)
by: Hou, Siyuan, et al.
Published: (2024)
Improving Musical Accompaniment Co-creation via Diffusion Transformers
by: Nistal, Javier, et al.
Published: (2024)
by: Nistal, Javier, et al.
Published: (2024)
METEOR: Melody-aware Texture-controllable Symbolic Orchestral Music Generation via Transformer VAE
by: Le, Dinh-Viet-Toan, et al.
Published: (2024)
by: Le, Dinh-Viet-Toan, et al.
Published: (2024)
MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT
by: Zhu, Jinlong, et al.
Published: (2024)
by: Zhu, Jinlong, et al.
Published: (2024)
MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
by: Wang, Yutian, et al.
Published: (2024)
by: Wang, Yutian, et al.
Published: (2024)
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
by: Wang, Ruoyu, et al.
Published: (2024)
by: Wang, Ruoyu, et al.
Published: (2024)
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
by: Niu, Shutong, et al.
Published: (2024)
by: Niu, Shutong, et al.
Published: (2024)
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
by: Dai, Yusheng, et al.
Published: (2025)
by: Dai, Yusheng, et al.
Published: (2025)
Music Style Transfer with Time-Varying Inversion of Diffusion Models
by: Li, Sifei, et al.
Published: (2024)
by: Li, Sifei, et al.
Published: (2024)
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
by: Bai, Ye, et al.
Published: (2024)
by: Bai, Ye, et al.
Published: (2024)
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
by: Wu, Shangda, et al.
Published: (2025)
by: Wu, Shangda, et al.
Published: (2025)
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing
by: Wu, Shangda, et al.
Published: (2024)
by: Wu, Shangda, et al.
Published: (2024)
ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025)
by: Ni-Hahn, Stephen, et al.
Published: (2025)
Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics
by: Lehmkuhl, Jonathan, et al.
Published: (2025)
by: Lehmkuhl, Jonathan, et al.
Published: (2025)
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
by: Zhang, Jing-Xuan, et al.
Published: (2025)
by: Zhang, Jing-Xuan, et al.
Published: (2025)
OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction
by: Alonso-Jiménez, Pablo, et al.
Published: (2025)
by: Alonso-Jiménez, Pablo, et al.
Published: (2025)
Vision-Integrated High-Quality Neural Speech Coding
by: Guo, Yao, et al.
Published: (2025)
by: Guo, Yao, et al.
Published: (2025)
A Diffusion-Based Generative Equalizer for Music Restoration
by: Moliner, Eloi, et al.
Published: (2024)
by: Moliner, Eloi, et al.
Published: (2024)
Improving Music Source Separation with Diffusion and Consistency Refinement
by: Karchkhadze, Tornike, et al.
Published: (2024)
by: Karchkhadze, Tornike, et al.
Published: (2024)
MuPT: A Generative Symbolic Music Pretrained Transformer
by: Qu, Xingwei, et al.
Published: (2024)
by: Qu, Xingwei, et al.
Published: (2024)
Music2Fail: Transfer Music to Failed Recorder Style
by: Leong, Chon In, et al.
Published: (2024)
by: Leong, Chon In, et al.
Published: (2024)
Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation
by: Liu, Shuyang, et al.
Published: (2025)
by: Liu, Shuyang, et al.
Published: (2025)
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
by: Chang, Sungkyun, et al.
Published: (2024)
by: Chang, Sungkyun, et al.
Published: (2024)
Music De-limiter Networks via Sample-wise Gain Inversion
by: Jeon, Chang-Bin, et al.
Published: (2023)
by: Jeon, Chang-Bin, et al.
Published: (2023)
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer
by: Hai, Jiarui, et al.
Published: (2024)
by: Hai, Jiarui, et al.
Published: (2024)
Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
by: Huang, Jiawen, et al.
Published: (2025)
by: Huang, Jiawen, et al.
Published: (2025)
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)
by: Wan, Genshun, et al.
Published: (2026)
StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)
by: Guo, Dake, et al.
Published: (2025)
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
by: Koo, Junghyun, et al.
Published: (2024)
by: Koo, Junghyun, et al.
Published: (2024)
Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
GD-Retriever: Controllable Generative Text-Music Retrieval with Diffusion Models
by: Guinot, Julien, et al.
Published: (2025)
by: Guinot, Julien, et al.
Published: (2025)
MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer
by: Yao, Dong, et al.
Published: (2023)
by: Yao, Dong, et al.
Published: (2023)
SteerMusic: Enhanced Musical Consistency for Zero-shot Text-guided and Personalized Music Editing
by: Niu, Xinlei, et al.
Published: (2025)
by: Niu, Xinlei, et al.
Published: (2025)
Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model
by: Karchkhadze, Tornike, et al.
Published: (2024)
by: Karchkhadze, Tornike, et al.
Published: (2024)
Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
by: Karchkhadze, Tornike, et al.
Published: (2024)
by: Karchkhadze, Tornike, et al.
Published: (2024)
Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning
by: Zhang, Jisi, et al.
Published: (2025)
by: Zhang, Jisi, et al.
Published: (2025)
Anticipatory Music Transformer
by: Thickstun, John, et al.
Published: (2023)
by: Thickstun, John, et al.
Published: (2023)
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
by: He, Mao-Kui, et al.
Published: (2024)
by: He, Mao-Kui, et al.
Published: (2024)
Similar Items
-
Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
by: Guo, Zilu, et al.
Published: (2023) -
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
by: Hou, Siyuan, et al.
Published: (2024) -
Improving Musical Accompaniment Co-creation via Diffusion Transformers
by: Nistal, Javier, et al.
Published: (2024) -
METEOR: Melody-aware Texture-controllable Symbolic Orchestral Music Generation via Transformer VAE
by: Le, Dinh-Viet-Toan, et al.
Published: (2024) -
MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT
by: Zhu, Jinlong, et al.
Published: (2024)