:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Huakang, Cheng, Wenkai, Ma, Guobin, Hao, Chunbo, Xia, Yuxuan, Wei, Mengqi, Zhao, Zhixian, Zhu, Pengcheng, Zhang, Hanbing, Xie, Lei
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2605.17414
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)

YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026)

DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
by: Chen, Huakang, et al.
Published: (2025)

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
by: Yao, Jixun, et al.
Published: (2025)

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)

SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision
by: Hao, Chunbo, et al.
Published: (2025)

OmniCodec: Low Frame Rate Universal Audio Codec with Semantic-Acoustic Disentanglement
by: Hu, Jingbin, et al.
Published: (2026)

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs
by: Tian, Wenjie, et al.
Published: (2026)

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
by: Ma, Guobin, et al.
Published: (2026)

Improving Musical Accompaniment Co-creation via Diffusion Transformers
by: Nistal, Javier, et al.
Published: (2024)

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
by: Grachten, Maarten, et al.
Published: (2025)

MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
by: Ma, Guobin, et al.
Published: (2025)

SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
by: Guo, Zhao, et al.
Published: (2025)

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)

Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
by: Hong, Zhiqing, et al.
Published: (2024)

A Neural Score Follower for Computer Accompaniment of Polyphonic Musical Instruments
by: Pillay, Ashwin
Published: (2025)

Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models
by: Nistal, Javier, et al.
Published: (2024)

Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model
by: Xia, Kangxiang, et al.
Published: (2026)

Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
by: Ning, Ziqian, et al.
Published: (2024)

Improving Real-Time Music Accompaniment Separation with MMDenseNet
by: Wang, Chun-Hsiang, et al.
Published: (2024)

SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
by: Li, Xingchen, et al.
Published: (2025)

VoiceSculptor: Your Voice, Designed By You
by: Hu, Jingbin, et al.
Published: (2026)

MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline
by: Tsai, Fang-Duo, et al.
Published: (2026)

DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)

REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025)

Bass Accompaniment Generation via Latent Diffusion
by: Pasini, Marco, et al.
Published: (2024)

Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling
by: Zhao, Jingwei, et al.
Published: (2023)

ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
by: Ni-Hahn, Stephen, et al.
Published: (2025)

Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
by: Ou, Longshen, et al.
Published: (2024)

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)

U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding
by: Wang, Ziqian, et al.
Published: (2025)

Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models
by: Yang, Yi, et al.
Published: (2025)

Semantic-Aware Interpretable Multimodal Music Auto-Tagging
by: Patakis, Andreas, et al.
Published: (2025)

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
by: Yao, Jixun, et al.
Published: (2024)

Accent-VITS:accent transfer for end-to-end TTS
by: Ma, Linhan, et al.
Published: (2023)

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
by: Niu, Zhikang, et al.
Published: (2025)

Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
by: Mu, Bingshen, et al.
Published: (2025)

Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge
by: Wang, Chengyou, et al.
Published: (2026)

Similarity-Guided Diffusion for Long-Gap Music Inpainting
by: Turland, Sean, et al.
Published: (2025)