Saved in:
| Main Authors: | Hao, Chunbo, Yuan, Ruibin, Yao, Jixun, Deng, Qixin, Bai, Xinyi, Wang, Yanbo, Xue, Wei, Xie, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.02797 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)
by: Ning, Ziqian, et al.
Published: (2025)
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
by: Chen, Huakang, et al.
Published: (2025)
by: Chen, Huakang, et al.
Published: (2025)
The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
by: Ma, Guobin, et al.
Published: (2026)
by: Ma, Guobin, et al.
Published: (2026)
DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech
by: Xia, Kangxiang, et al.
Published: (2025)
by: Xia, Kangxiang, et al.
Published: (2025)
S2Accompanist: A Semantic-Aware and Structure-Guided Diffusion Model for Music Accompaniment Generation
by: Chen, Huakang, et al.
Published: (2026)
by: Chen, Huakang, et al.
Published: (2026)
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)
by: Guo, Dake, et al.
Published: (2025)
SongTrans: An unified song transcription and alignment method for lyrics and notes
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
by: Hou, Siyuan, et al.
Published: (2024)
by: Hou, Siyuan, et al.
Published: (2024)
YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026)
by: Hao, Chunbo, et al.
Published: (2026)
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2024)
by: Ning, Ziqian, et al.
Published: (2024)
MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction
by: Du, Renmingyue, et al.
Published: (2024)
by: Du, Renmingyue, et al.
Published: (2024)
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)
by: Ning, Ziqian, et al.
Published: (2023)
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
by: Ma, Guobin, et al.
Published: (2025)
by: Ma, Guobin, et al.
Published: (2025)
Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
by: Ning, Ziqian, et al.
Published: (2024)
by: Ning, Ziqian, et al.
Published: (2024)
UniFlow: Unifying Speech Front-End Tasks via Continuous Generative Modeling
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding
by: Wu, Shangda, et al.
Published: (2026)
by: Wu, Shangda, et al.
Published: (2026)
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
by: Xia, Kangxiang, et al.
Published: (2024)
by: Xia, Kangxiang, et al.
Published: (2024)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
by: Suda, Hitoshi, et al.
Published: (2024)
by: Suda, Hitoshi, et al.
Published: (2024)
TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure
by: He, Qi, et al.
Published: (2025)
by: He, Qi, et al.
Published: (2025)
NTU-NPU System for Voice Privacy 2024 Challenge
by: Kuzmin, Nikita, et al.
Published: (2024)
by: Kuzmin, Nikita, et al.
Published: (2024)
NPU-NTU System for Voice Privacy 2024 Challenge
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
by: Hong, Zhiqing, et al.
Published: (2024)
by: Hong, Zhiqing, et al.
Published: (2024)
Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation
by: Zhou, Ziya, et al.
Published: (2024)
by: Zhou, Ziya, et al.
Published: (2024)
Intelligent Text-Conditioned Music Generation
by: Xie, Zhouyao, et al.
Published: (2024)
by: Xie, Zhouyao, et al.
Published: (2024)
Automatic Live Music Song Identification Using Multi-level Deep Sequence Similarity Learning
by: Hakala, Aapo, et al.
Published: (2025)
by: Hakala, Aapo, et al.
Published: (2025)
Semi-Supervised Contrastive Learning of Musical Representations
by: Guinot, Julien, et al.
Published: (2024)
by: Guinot, Julien, et al.
Published: (2024)
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
by: Wu, Shangda, et al.
Published: (2025)
by: Wu, Shangda, et al.
Published: (2025)
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
by: Guo, Dake, et al.
Published: (2024)
by: Guo, Dake, et al.
Published: (2024)
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)
by: Yang, Yuguang, et al.
Published: (2024)
Similar Items
-
SongEval: A Benchmark Dataset for Song Aesthetics Evaluation
by: Yao, Jixun, et al.
Published: (2025) -
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025) -
DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization
by: Chen, Huakang, et al.
Published: (2025) -
The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge
by: Ma, Guobin, et al.
Published: (2026) -
DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)