:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Jiatao, Tang, Xing, Duan, Xiaoyue, Feng, Yutang, Zhang, Jinchao, Zhou, Jie
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.08233
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
by: Guo, Wenxiang, et al.
Published: (2025)

Diffusion-based Symbolic Music Generation with Structured State Space Models
by: Yuan, Shenghua, et al.
Published: (2025)

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
by: Yang, Yuguang, et al.
Published: (2024)

NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation
by: Ni, Qinke, et al.
Published: (2026)

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
by: Zhang, Yu, et al.
Published: (2023)

Melodic and Metrical Elements of Expressiveness in Hindustani Vocal Music
by: Bhake, Yash, et al.
Published: (2025)

Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
by: Kim, Tae-Woo, et al.
Published: (2022)

Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review
by: Zhang, Hong, et al.
Published: (2024)

CartoonSing: Unifying Human and Nonhuman Timbres in Singing Generation
by: Han, Jionghao, et al.
Published: (2025)

Research on Piano Timbre Transformation System Based on Diffusion Model
by: Hsu, Chun-Chieh, et al.
Published: (2026)

CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
by: Zhao, Junchuan, et al.
Published: (2025)

BiSinger: Bilingual Singing Voice Synthesis
by: Zhou, Huali, et al.
Published: (2023)

SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
by: Sui, Kehan, et al.
Published: (2025)

Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
by: Zhou, Siyi, et al.
Published: (2025)

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
by: Wang, Yongqi, et al.
Published: (2024)

A Semantic Timbre Dataset for the Electric Guitar
by: Cameron, Joseph, et al.
Published: (2026)

MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision
by: Chen, Jiatao, et al.
Published: (2024)

The First Voice Timbre Attribute Detection Challenge
by: Chen, Liping, et al.
Published: (2025)

UniVocal: Unified Speech-Singing Code-Switching Synthesis
by: Shi, Yufei, et al.
Published: (2026)

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)

YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance
by: Hao, Chunbo, et al.
Published: (2026)

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
by: Li, Zehua Kcriss, et al.
Published: (2024)

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)

Evaluating Latent Space Structure in Timbre VAEs: A Comparative Study of Unsupervised, Descriptor-Conditioned, and Perceptual Feature-Conditioned Models
by: Cameron, Joseph, et al.
Published: (2026)

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches
by: Pan, Changhao, et al.
Published: (2026)

Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
by: Chen, Leduo, et al.
Published: (2026)

LaDA-Band: Language Diffusion Models for Vocal-to-Accompaniment Generation
by: Wang, Qi, et al.
Published: (2026)

NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations
by: Xue, Liumeng, et al.
Published: (2026)

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
by: Chen, Weidong, et al.
Published: (2025)

QvTAD: Differential Relative Attribute Learning for Voice Timbre Attribute Detection
by: Wu, Zhiyu, et al.
Published: (2025)

Timbre Difference Capturing in Anomalous Sound Detection
by: Nishida, Tomoya, et al.
Published: (2024)

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
by: Feng, Pengchao, et al.
Published: (2025)

MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity
by: Kim, Sungjae, et al.
Published: (2022)

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
by: Zheng, Junjie, et al.
Published: (2025)

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)

Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
by: Cho, Deok-Hyeon, et al.
Published: (2026)

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion
by: Baoueb, Teysir, et al.
Published: (2024)

Assessing the Alignment of Audio Representations with Timbre Similarity Ratings
by: Tian, Haokun, et al.
Published: (2025)

Singer separation for karaoke content generation
by: Lin, Hsuan-Yu, et al.
Published: (2021)