Saved in:
| Main Authors: | An, Yanjie, Zhao, Yuxiang, Zhang, Yichi, Zheng, Qixi, Tu, Yujie, Deng, Keqi, Yu, Kai, Chen, Xie |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.30792 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
by: Deng, Keqi, et al.
Published: (2025)
by: Deng, Keqi, et al.
Published: (2025)
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
by: Zheng, Qixi, et al.
Published: (2025)
by: Zheng, Qixi, et al.
Published: (2025)
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024)
by: Chen, Yushen, et al.
Published: (2024)
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
by: Deng, Keqi, et al.
Published: (2023)
by: Deng, Keqi, et al.
Published: (2023)
X-VC: Zero-shot Streaming Voice Conversion in Codec Space
by: Zheng, Qixi, et al.
Published: (2026)
by: Zheng, Qixi, et al.
Published: (2026)
Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
by: Zhao, Jinzheng, et al.
Published: (2024)
by: Zhao, Jinzheng, et al.
Published: (2024)
Anonymization, Not Elimination: Utility-Preserved Speech Anonymization
by: Xiao, Yunchong, et al.
Published: (2026)
by: Xiao, Yunchong, et al.
Published: (2026)
Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
by: Chen, Yushen, et al.
Published: (2026)
by: Chen, Yushen, et al.
Published: (2026)
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
by: Niu, Zhikang, et al.
Published: (2025)
by: Niu, Zhikang, et al.
Published: (2025)
Open-Source System for Multilingual Translation and Cloned Speech Synthesis
by: Cámara, Mateo, et al.
Published: (2025)
by: Cámara, Mateo, et al.
Published: (2025)
Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
by: Zhao, Yuxiang, et al.
Published: (2025)
by: Zhao, Yuxiang, et al.
Published: (2025)
Rethinking Flow and Diffusion Bridge Models for Speech Enhancement
by: Wang, Dahan, et al.
Published: (2026)
by: Wang, Dahan, et al.
Published: (2026)
Augmenting Open-Vocabulary Dysarthric Speech Assessment with Human Perceptual Supervision
by: Jia, Kaimeng, et al.
Published: (2025)
by: Jia, Kaimeng, et al.
Published: (2025)
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)
by: Tu, Wenming, et al.
Published: (2025)
Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
by: Wang, Huimeng, et al.
Published: (2026)
by: Wang, Huimeng, et al.
Published: (2026)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation
by: Yan, Ruiqi, et al.
Published: (2026)
by: Yan, Ruiqi, et al.
Published: (2026)
UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search
by: Rong, Xiaobin, et al.
Published: (2025)
by: Rong, Xiaobin, et al.
Published: (2025)
Acoustic BPE for Speech Generation with Discrete Tokens
by: Shen, Feiyu, et al.
Published: (2023)
by: Shen, Feiyu, et al.
Published: (2023)
SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
by: Li, Xingchen, et al.
Published: (2025)
by: Li, Xingchen, et al.
Published: (2025)
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)
by: Du, Chenpeng, et al.
Published: (2022)
BoSS: Beyond-Semantic Speech
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
Position: Towards Responsible Evaluation for Text-to-Speech
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)
by: Cui, Mingyu, et al.
Published: (2025)
Frequency-mix Knowledge Distillation for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2024)
by: Fan, Cunhang, et al.
Published: (2024)
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)
by: Wang, Hankun, et al.
Published: (2024)
Semantic MIMO Systems for Speech-to-Text Transmission
by: Weng, Zhenzi, et al.
Published: (2024)
by: Weng, Zhenzi, et al.
Published: (2024)
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
by: Du, Jiayu, et al.
Published: (2024)
by: Du, Jiayu, et al.
Published: (2024)
Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
by: Moritz, Niko, et al.
Published: (2024)
by: Moritz, Niko, et al.
Published: (2024)
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
by: Xu, Kai-Tuo, et al.
Published: (2025)
by: Xu, Kai-Tuo, et al.
Published: (2025)
Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
by: Xie, Yuying, et al.
Published: (2022)
by: Xie, Yuying, et al.
Published: (2022)
MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
by: Deng, Chengxi, et al.
Published: (2025)
by: Deng, Chengxi, et al.
Published: (2025)
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
by: Wang, Xinsheng, et al.
Published: (2025)
by: Wang, Xinsheng, et al.
Published: (2025)
Robust Semantic Communications for Speech Transmission
by: Weng, Zhenzi, et al.
Published: (2024)
by: Weng, Zhenzi, et al.
Published: (2024)
LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models
by: Zhao, Zhiyuan, et al.
Published: (2026)
by: Zhao, Zhiyuan, et al.
Published: (2026)
SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
by: Zheng, Youqiang, et al.
Published: (2024)
by: Zheng, Youqiang, et al.
Published: (2024)
Similar Items
-
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
by: Deng, Keqi, et al.
Published: (2025) -
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling
by: Zheng, Qixi, et al.
Published: (2025) -
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
by: Deng, Keqi, et al.
Published: (2024) -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
by: Chen, Yushen, et al.
Published: (2024) -
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
by: Deng, Keqi, et al.
Published: (2023)