Saved in:
| Main Authors: | Chen, Jiatao, Tang, Xing, Duan, Xiaoyue, Feng, Yutang, Zhang, Jinchao, Zhou, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08233 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
by: Guo, Wenxiang, et al.
Published: (2025)
by: Guo, Wenxiang, et al.
Published: (2025)
Diffusion-based Symbolic Music Generation with Structured State Space Models
by: Yuan, Shenghua, et al.
Published: (2025)
by: Yuan, Shenghua, et al.
Published: (2025)
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)
by: Yang, Yuguang, et al.
Published: (2024)
NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation
by: Ni, Qinke, et al.
Published: (2026)
by: Ni, Qinke, et al.
Published: (2026)
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2023)
by: Zhang, Yu, et al.
Published: (2023)
Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music
by: Bhake, Yash, et al.
Published: (2025)
by: Bhake, Yash, et al.
Published: (2025)
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
by: Kim, Tae-Woo, et al.
Published: (2022)
by: Kim, Tae-Woo, et al.
Published: (2022)
Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review
by: Zhang, Hong, et al.
Published: (2024)
by: Zhang, Hong, et al.
Published: (2024)
CartoonSing: Unifying Human and Nonhuman Timbres in Singing Generation
by: Han, Jionghao, et al.
Published: (2025)
by: Han, Jionghao, et al.
Published: (2025)
Research on Piano Timbre Transformation System Based on Diffusion Model
by: Hsu, Chun-Chieh, et al.
Published: (2026)
by: Hsu, Chun-Chieh, et al.
Published: (2026)
CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
BiSinger: Bilingual Singing Voice Synthesis
by: Zhou, Huali, et al.
Published: (2023)
by: Zhou, Huali, et al.
Published: (2023)
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
by: Sui, Kehan, et al.
Published: (2025)
by: Sui, Kehan, et al.
Published: (2025)
Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)
by: Yutani, Tsugumasa, et al.
Published: (2024)
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)
by: Zhou, Siyi, et al.
Published: (2025)
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
by: Wang, Yongqi, et al.
Published: (2024)
by: Wang, Yongqi, et al.
Published: (2024)
A Semantic Timbre Dataset for the Electric Guitar
by: Cameron, Joseph, et al.
Published: (2026)
by: Cameron, Joseph, et al.
Published: (2026)
MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision
by: Chen, Jiatao, et al.
Published: (2024)
by: Chen, Jiatao, et al.
Published: (2024)
The First Voice Timbre Attribute Detection Challenge
by: Chen, Liping, et al.
Published: (2025)
by: Chen, Liping, et al.
Published: (2025)
UniVocal: Unified Speech-Singing Code-Switching Synthesis
by: Shi, Yufei, et al.
Published: (2026)
by: Shi, Yufei, et al.
Published: (2026)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026)
by: Hao, Chunbo, et al.
Published: (2026)
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
by: Li, Zehua Kcriss, et al.
Published: (2024)
by: Li, Zehua Kcriss, et al.
Published: (2024)
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)
by: Yang, Qian, et al.
Published: (2024)
Evaluating Latent Space Structure in Timbre VAEs: A Comparative Study of Unsupervised, Descriptor-Conditioned, and Perceptual Feature-Conditioned Models
by: Cameron, Joseph, et al.
Published: (2026)
by: Cameron, Joseph, et al.
Published: (2026)
Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches
by: Pan, Changhao, et al.
Published: (2026)
by: Pan, Changhao, et al.
Published: (2026)
Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
by: Chen, Leduo, et al.
Published: (2026)
by: Chen, Leduo, et al.
Published: (2026)
LaDA-Band: Language Diffusion Models for Vocal-to-Accompaniment Generation
by: Wang, Qi, et al.
Published: (2026)
by: Wang, Qi, et al.
Published: (2026)
NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations
by: Xue, Liumeng, et al.
Published: (2026)
by: Xue, Liumeng, et al.
Published: (2026)
DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
by: Chen, Weidong, et al.
Published: (2025)
by: Chen, Weidong, et al.
Published: (2025)
QvTAD: Differential Relative Attribute Learning for Voice Timbre Attribute Detection
by: Wu, Zhiyu, et al.
Published: (2025)
by: Wu, Zhiyu, et al.
Published: (2025)
Timbre Difference Capturing in Anomalous Sound Detection
by: Nishida, Tomoya, et al.
Published: (2024)
by: Nishida, Tomoya, et al.
Published: (2024)
Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
by: Feng, Pengchao, et al.
Published: (2025)
by: Feng, Pengchao, et al.
Published: (2025)
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity
by: Kim, Sungjae, et al.
Published: (2022)
by: Kim, Sungjae, et al.
Published: (2022)
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
by: Cho, Deok-Hyeon, et al.
Published: (2026)
by: Cho, Deok-Hyeon, et al.
Published: (2026)
WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion
by: Baoueb, Teysir, et al.
Published: (2024)
by: Baoueb, Teysir, et al.
Published: (2024)
Assessing the Alignment of Audio Representations with Timbre Similarity Ratings
by: Tian, Haokun, et al.
Published: (2025)
by: Tian, Haokun, et al.
Published: (2025)
Singer separation for karaoke content generation
by: Lin, Hsuan-Yu, et al.
Published: (2021)
by: Lin, Hsuan-Yu, et al.
Published: (2021)
Similar Items
-
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
by: Guo, Wenxiang, et al.
Published: (2025) -
Diffusion-based Symbolic Music Generation with Structured State Space Models
by: Yuan, Shenghua, et al.
Published: (2025) -
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024) -
NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation
by: Ni, Qinke, et al.
Published: (2026) -
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2023)