Saved in:
| Main Authors: | Ma, Chengyuan, Jin, Jiawei, Xiong, Ruijie, Jin, Chunxiang, Yan, Canxiang, Yang, Wenming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02591 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
by: Ma, Chengyuan, et al.
Published: (2026)
by: Ma, Chengyuan, et al.
Published: (2026)
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
by: Yan, Canxiang, et al.
Published: (2025)
by: Yan, Canxiang, et al.
Published: (2025)
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)
by: Anastassiou, Philip, et al.
Published: (2024)
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)
by: Li, Haitao, et al.
Published: (2026)
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
by: Cheng, Sitong, et al.
Published: (2025)
by: Cheng, Sitong, et al.
Published: (2025)
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
by: Luo, Dan, et al.
Published: (2025)
by: Luo, Dan, et al.
Published: (2025)
Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation
by: Zhang, Xueyao, et al.
Published: (2025)
by: Zhang, Xueyao, et al.
Published: (2025)
SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
by: Sui, Kehan, et al.
Published: (2025)
by: Sui, Kehan, et al.
Published: (2025)
When Tone and Words Disagree: Towards Robust Speech Emotion Recognition under Acoustic-Semantic Conflict
by: Huang, Dawei, et al.
Published: (2026)
by: Huang, Dawei, et al.
Published: (2026)
Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu
Published: (2025)
by: Kim, Nam-Gyu
Published: (2025)
A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis
by: Amir, Javeria, et al.
Published: (2025)
by: Amir, Javeria, et al.
Published: (2025)
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
by: Geng, Yizhong, et al.
Published: (2025)
by: Geng, Yizhong, et al.
Published: (2025)
SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
by: Zhang, Zhisheng, et al.
Published: (2025)
by: Zhang, Zhisheng, et al.
Published: (2025)
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
by: Sui, Kehan, et al.
Published: (2024)
by: Sui, Kehan, et al.
Published: (2024)
Mitigating Unauthorized Speech Synthesis for Voice Protection
by: Zhang, Zhisheng, et al.
Published: (2024)
by: Zhang, Zhisheng, et al.
Published: (2024)
UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)
by: Yan, Haoyin, et al.
Published: (2025)
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)
by: Moell, Birger, et al.
Published: (2025)
Emotion-Aware Speech Generation with Character-Specific Voices for Comics
by: Qian, Zhiwen, et al.
Published: (2025)
by: Qian, Zhiwen, et al.
Published: (2025)
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)
by: Chen, Yukun, et al.
Published: (2026)
DSFlow: Dual Supervision and Step-Aware Architecture for One-Step Flow Matching Speech Synthesis
by: Lin, Bin, et al.
Published: (2026)
by: Lin, Bin, et al.
Published: (2026)
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
by: Du, Zhihao, et al.
Published: (2024)
by: Du, Zhihao, et al.
Published: (2024)
Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers
by: Cai, Runyuan, et al.
Published: (2026)
by: Cai, Runyuan, et al.
Published: (2026)
Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)
by: Truong, Duc-Tuan, et al.
Published: (2025)
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
by: Zheng, Junjie, et al.
Published: (2025)
by: Zheng, Junjie, et al.
Published: (2025)
LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning
by: Dhar, Sandipan, et al.
Published: (2025)
by: Dhar, Sandipan, et al.
Published: (2025)
QAMO: Quality-aware Multi-centroid One-class Learning For Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)
by: Truong, Duc-Tuan, et al.
Published: (2025)
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)
by: Lin, Zijian, et al.
Published: (2025)
MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors
by: Bao, Guangyin, et al.
Published: (2026)
by: Bao, Guangyin, et al.
Published: (2026)
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
by: Li, Yingahao Aaron, et al.
Published: (2024)
by: Li, Yingahao Aaron, et al.
Published: (2024)
ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)
by: Toyin, Hawau Olamide, et al.
Published: (2025)
AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
by: Sajid, M., et al.
Published: (2025)
by: Sajid, M., et al.
Published: (2025)
Large Speech Model Enabled Semantic Communication
by: Tian, Yun, et al.
Published: (2025)
by: Tian, Yun, et al.
Published: (2025)
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
by: Lau, Hok-Shing, et al.
Published: (2024)
by: Lau, Hok-Shing, et al.
Published: (2024)
Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS
by: Francesconi, Arianna, et al.
Published: (2026)
by: Francesconi, Arianna, et al.
Published: (2026)
An Agent-Based Framework for Automated Higher-Voice Harmony Generation
by: Ganapathy, Nia D'Souza, et al.
Published: (2025)
by: Ganapathy, Nia D'Souza, et al.
Published: (2025)
Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
by: Yang, Yudong, et al.
Published: (2025)
by: Yang, Yudong, et al.
Published: (2025)
Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
by: Tiwari, Upasana, et al.
Published: (2025)
by: Tiwari, Upasana, et al.
Published: (2025)
Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation
by: Thebaud, Thomas, et al.
Published: (2026)
by: Thebaud, Thomas, et al.
Published: (2026)
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu, et al.
Published: (2025)
by: Kim, Nam-Gyu, et al.
Published: (2025)
Similar Items
-
TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
by: Ma, Chengyuan, et al.
Published: (2026) -
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
by: Yan, Canxiang, et al.
Published: (2025) -
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024) -
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026) -
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
by: Cheng, Sitong, et al.
Published: (2025)