:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ma, Chengyuan, Jin, Jiawei, Xiong, Ruijie, Jin, Chunxiang, Yan, Canxiang, Yang, Wenming
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.02591
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
by: Ma, Chengyuan, et al.
Published: (2026)

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
by: Yan, Canxiang, et al.
Published: (2025)

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
by: Li, Haitao, et al.
Published: (2026)

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
by: Cheng, Sitong, et al.
Published: (2025)

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
by: Luo, Dan, et al.
Published: (2025)

Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation
by: Zhang, Xueyao, et al.
Published: (2025)

SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture
by: Sui, Kehan, et al.
Published: (2025)

When Tone and Words Disagree: Towards Robust Speech Emotion Recognition under Acoustic-Semantic Conflict
by: Huang, Dawei, et al.
Published: (2026)

Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu
Published: (2025)

A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis
by: Amir, Javeria, et al.
Published: (2025)

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
by: Geng, Yizhong, et al.
Published: (2025)

SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
by: Zhang, Zhisheng, et al.
Published: (2025)

RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
by: Sui, Kehan, et al.
Published: (2024)

Mitigating Unauthorized Speech Synthesis for Voice Protection
by: Zhang, Zhisheng, et al.
Published: (2024)

UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)

Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
by: Moell, Birger, et al.
Published: (2025)

Emotion-Aware Speech Generation with Character-Specific Voices for Comics
by: Qian, Zhiwen, et al.
Published: (2025)

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models
by: Chen, Yukun, et al.
Published: (2026)

DSFlow: Dual Supervision and Step-Aware Architecture for One-Step Flow Matching Speech Synthesis
by: Lin, Bin, et al.
Published: (2026)

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
by: Du, Zhihao, et al.
Published: (2024)

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers
by: Cai, Runyuan, et al.
Published: (2026)

Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
by: Zheng, Junjie, et al.
Published: (2025)

LAPS-Diff: A Diffusion-Based Framework for Singing Voice Synthesis With Language Aware Prosody-Style Guided Learning
by: Dhar, Sandipan, et al.
Published: (2025)

QAMO: Quality-aware Multi-centroid One-class Learning For Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)

MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors
by: Bao, Guangyin, et al.
Published: (2026)

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
by: Li, Yingahao Aaron, et al.
Published: (2024)

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
by: Toyin, Hawau Olamide, et al.
Published: (2025)

AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
by: Sajid, M., et al.
Published: (2025)

Large Speech Model Enabled Semantic Communication
by: Tian, Yun, et al.
Published: (2025)

Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
by: Lau, Hok-Shing, et al.
Published: (2024)

Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS
by: Francesconi, Arianna, et al.
Published: (2026)

An Agent-Based Framework for Automated Higher-Voice Harmony Generation
by: Ganapathy, Nia D'Souza, et al.
Published: (2025)

Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
by: Pan, Yu, et al.
Published: (2024)

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
by: Yang, Yudong, et al.
Published: (2025)

Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
by: Tiwari, Upasana, et al.
Published: (2025)

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation
by: Thebaud, Thomas, et al.
Published: (2026)

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu, et al.
Published: (2025)