Saved in:
| Main Authors: | Chen, Huakang, Cheng, Wenkai, Ma, Guobin, Hao, Chunbo, Xia, Yuxuan, Wei, Mengqi, Zhao, Zhixian, Zhu, Pengcheng, Zhang, Hanbing, Xie, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17414 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)
by: Ning, Ziqian, et al.
Published: (2025)
YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026)
by: Hao, Chunbo, et al.
Published: (2026)
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
by: Chen, Huakang, et al.
Published: (2025)
by: Chen, Huakang, et al.
Published: (2025)
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision
by: Hao, Chunbo, et al.
Published: (2025)
by: Hao, Chunbo, et al.
Published: (2025)
OmniCodec: Low Frame Rate Universal Audio Codec with Semantic-Acoustic Disentanglement
by: Hu, Jingbin, et al.
Published: (2026)
by: Hu, Jingbin, et al.
Published: (2026)
EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
by: Ma, Guobin, et al.
Published: (2026)
by: Ma, Guobin, et al.
Published: (2026)
Improving Musical Accompaniment Co-creation via Diffusion Transformers
by: Nistal, Javier, et al.
Published: (2024)
by: Nistal, Javier, et al.
Published: (2024)
Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
by: Grachten, Maarten, et al.
Published: (2025)
by: Grachten, Maarten, et al.
Published: (2025)
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
by: Ma, Guobin, et al.
Published: (2025)
by: Ma, Guobin, et al.
Published: (2025)
SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
by: Guo, Zhao, et al.
Published: (2025)
by: Guo, Zhao, et al.
Published: (2025)
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)
by: Chen, Huakang, et al.
Published: (2026)
Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
by: Hong, Zhiqing, et al.
Published: (2024)
by: Hong, Zhiqing, et al.
Published: (2024)
A Neural Score Follower for Computer Accompaniment of Polyphonic Musical Instruments
by: Pillay, Ashwin
Published: (2025)
by: Pillay, Ashwin
Published: (2025)
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models
by: Nistal, Javier, et al.
Published: (2024)
by: Nistal, Javier, et al.
Published: (2024)
Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model
by: Xia, Kangxiang, et al.
Published: (2026)
by: Xia, Kangxiang, et al.
Published: (2026)
Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
by: Ning, Ziqian, et al.
Published: (2024)
by: Ning, Ziqian, et al.
Published: (2024)
Improving Real-Time Music Accompaniment Separation with MMDenseNet
by: Wang, Chun-Hsiang, et al.
Published: (2024)
by: Wang, Chun-Hsiang, et al.
Published: (2024)
SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
by: Li, Xingchen, et al.
Published: (2025)
by: Li, Xingchen, et al.
Published: (2025)
VoiceSculptor: Your Voice, Designed By You
by: Hu, Jingbin, et al.
Published: (2026)
by: Hu, Jingbin, et al.
Published: (2026)
MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline
by: Tsai, Fang-Duo, et al.
Published: (2026)
by: Tsai, Fang-Duo, et al.
Published: (2026)
DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
Bass Accompaniment Generation via Latent Diffusion
by: Pasini, Marco, et al.
Published: (2024)
by: Pasini, Marco, et al.
Published: (2024)
Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling
by: Zhao, Jingwei, et al.
Published: (2023)
by: Zhao, Jingwei, et al.
Published: (2023)
ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025)
by: Ni-Hahn, Stephen, et al.
Published: (2025)
Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
by: Ou, Longshen, et al.
Published: (2024)
by: Ou, Longshen, et al.
Published: (2024)
DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
Semantic-Aware Interpretable Multimodal Music Auto-Tagging
by: Patakis, Andreas, et al.
Published: (2025)
by: Patakis, Andreas, et al.
Published: (2025)
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
Accent-VITS:accent transfer for end-to-end TTS
by: Ma, Linhan, et al.
Published: (2023)
by: Ma, Linhan, et al.
Published: (2023)
Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
by: Niu, Zhikang, et al.
Published: (2025)
by: Niu, Zhikang, et al.
Published: (2025)
Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge
by: Wang, Chengyou, et al.
Published: (2026)
by: Wang, Chengyou, et al.
Published: (2026)
Similarity-Guided Diffusion for Long-Gap Music Inpainting
by: Turland, Sean, et al.
Published: (2025)
by: Turland, Sean, et al.
Published: (2025)
Similar Items
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025) -
YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026) -
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
by: Chen, Huakang, et al.
Published: (2025) -
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
by: Yao, Jixun, et al.
Published: (2025) -
dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)