:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hao, Chunbo, Yuan, Ruibin, Yao, Jixun, Deng, Qixin, Bai, Xinyi, Wang, Yanbo, Xue, Wei, Xie, Lei
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.02797
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
by: Yao, Jixun, et al.
Published: (2025)

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)

DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
by: Chen, Huakang, et al.
Published: (2025)

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
by: Ma, Guobin, et al.
Published: (2026)

DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)

MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech
by: Xia, Kangxiang, et al.
Published: (2025)

S2Accompanist: A Semantic-Aware and Structure-Guided Diffusion Model for Music Accompaniment Generation
by: Chen, Huakang, et al.
Published: (2026)

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
by: Yao, Jixun, et al.
Published: (2024)

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)

SongTrans: An unified song transcription and alignment method for lyrics and notes
by: Wu, Siwei, et al.
Published: (2024)

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
by: Hou, Siyuan, et al.
Published: (2024)

YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026)

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2024)

MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement
by: Yao, Jixun, et al.
Published: (2024)

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
by: Yao, Jixun, et al.
Published: (2025)

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)

Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction
by: Du, Renmingyue, et al.
Published: (2024)

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)

MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
by: Ma, Guobin, et al.
Published: (2025)

Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
by: Ning, Ziqian, et al.
Published: (2024)

UniFlow: Unifying Speech Front-End Tasks via Continuous Generative Modeling
by: Wang, Ziqian, et al.
Published: (2025)

Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding
by: Wu, Shangda, et al.
Published: (2026)

KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
by: Xia, Kangxiang, et al.
Published: (2024)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)

FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
by: Suda, Hitoshi, et al.
Published: (2024)

TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure
by: He, Qi, et al.
Published: (2025)

NTU-NPU System for Voice Privacy 2024 Challenge
by: Kuzmin, Nikita, et al.
Published: (2024)

NPU-NTU System for Voice Privacy 2024 Challenge
by: Yao, Jixun, et al.
Published: (2024)

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
by: Hong, Zhiqing, et al.
Published: (2024)

Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
by: Pan, Yu, et al.
Published: (2024)

Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024)

Intelligent Text-Conditioned Music Generation
by: Xie, Zhouyao, et al.
Published: (2024)

Automatic Live Music Song Identification Using Multi-level Deep Sequence Similarity Learning
by: Hakala, Aapo, et al.
Published: (2025)

Semi-Supervised Contrastive Learning of Musical Representations
by: Guinot, Julien, et al.
Published: (2024)

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
by: Wu, Shangda, et al.
Published: (2025)

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
by: Guo, Dake, et al.
Published: (2024)

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)