:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	An, Yanjie, Zhao, Yuxiang, Zhang, Yichi, Zheng, Qixi, Tu, Yujie, Deng, Keqi, Yu, Kai, Chen, Xie
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.30792
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
by: Deng, Keqi, et al.
Published: (2025)

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
by: Zheng, Qixi, et al.
Published: (2025)

Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
by: Deng, Keqi, et al.
Published: (2024)

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024)

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
by: Deng, Keqi, et al.
Published: (2023)

X-VC: Zero-shot Streaming Voice Conversion in Codec Space
by: Zheng, Qixi, et al.
Published: (2026)

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
by: Zhao, Jinzheng, et al.
Published: (2024)

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization
by: Xiao, Yunchong, et al.
Published: (2026)

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
by: Chen, Yushen, et al.
Published: (2026)

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)

Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
by: Niu, Zhikang, et al.
Published: (2025)

Open-Source System for Multilingual Translation and Cloned Speech Synthesis
by: Cámara, Mateo, et al.
Published: (2025)

Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)

Rethinking Flow and Diffusion Bridge Models for Speech Enhancement
by: Wang, Dahan, et al.
Published: (2026)

Augmenting Open-Vocabulary Dysarthric Speech Assessment with Human Perceptual Supervision
by: Jia, Kaimeng, et al.
Published: (2025)

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)

Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
by: Deng, Keqi, et al.
Published: (2024)

SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
by: Wang, Huimeng, et al.
Published: (2026)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation
by: Yan, Ruiqi, et al.
Published: (2026)

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search
by: Rong, Xiaobin, et al.
Published: (2025)

Acoustic BPE for Speech Generation with Discrete Tokens
by: Shen, Feiyu, et al.
Published: (2023)

SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
by: Li, Xingchen, et al.
Published: (2025)

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)

BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation
by: Liu, Wei, et al.
Published: (2025)

Position: Towards Responsible Evaluation for Text-to-Speech
by: Yang, Yifan, et al.
Published: (2025)

Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)

Frequency-mix Knowledge Distillation for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2024)

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)

Semantic MIMO Systems for Speech-to-Text Transmission
by: Weng, Zhenzi, et al.
Published: (2024)

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
by: Du, Jiayu, et al.
Published: (2024)

Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
by: Moritz, Niko, et al.
Published: (2024)

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)

Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
by: Xie, Yuying, et al.
Published: (2022)

MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
by: Deng, Chengxi, et al.
Published: (2025)

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
by: Wang, Xinsheng, et al.
Published: (2025)

Robust Semantic Communications for Speech Transmission
by: Weng, Zhenzi, et al.
Published: (2024)

LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models
by: Zhao, Zhiyuan, et al.
Published: (2026)

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
by: Zheng, Youqiang, et al.
Published: (2024)