Saved in:
| Main Authors: | Lou, Haowei, Huang, Chengkai, Paik, Hye-young, Hu, Yongquan, Quigley, Aaron, Hu, Wen, Yao, Lina |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.20113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech
by: Lou, Haowei, et al.
Published: (2026)
by: Lou, Haowei, et al.
Published: (2026)
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
by: Lou, Haowei, et al.
Published: (2024)
by: Lou, Haowei, et al.
Published: (2024)
Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration
by: Lou, Haowei, et al.
Published: (2024)
by: Lou, Haowei, et al.
Published: (2024)
LatentSpeech: Latent Diffusion for Text-To-Speech Generation
by: Lou, Haowei, et al.
Published: (2024)
by: Lou, Haowei, et al.
Published: (2024)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)
by: Liu, Xiaoqian, et al.
Published: (2024)
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
by: Min, Anna, et al.
Published: (2025)
by: Min, Anna, et al.
Published: (2025)
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)
by: Lu, Wenhuan, et al.
Published: (2025)
Representation Purification for End-to-End Speech Translation
by: Zhang, Chengwei, et al.
Published: (2024)
by: Zhang, Chengwei, et al.
Published: (2024)
A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
by: Meier, Fiona, et al.
Published: (2025)
by: Meier, Fiona, et al.
Published: (2025)
Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
by: Wang, Jialing, et al.
Published: (2026)
by: Wang, Jialing, et al.
Published: (2026)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
End-to-End Speech-to-Text Translation: A Survey
by: Sethiya, Nivedita, et al.
Published: (2023)
by: Sethiya, Nivedita, et al.
Published: (2023)
Frequency-Specific Neural Response and Cross-Correlation Analysis of Envelope Following Responses to Native Speech and Music Using Multichannel EEG Signals: A Case Study
by: Hasan, Md. Mahbub, et al.
Published: (2025)
by: Hasan, Md. Mahbub, et al.
Published: (2025)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
Probing Human Articulatory Constraints in End-to-End TTS with Reverse and Mismatched Speech-Text Directions
by: Khadse, Parth, et al.
Published: (2026)
by: Khadse, Parth, et al.
Published: (2026)
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)
by: Ma, Zhengrui, et al.
Published: (2024)
ML-ARIS: Multilayer Underwater Acoustic Reconfigurable Intelligent Surface with High-Resolution Reflection Control
by: Pu, Lina, et al.
Published: (2025)
by: Pu, Lina, et al.
Published: (2025)
An End-to-End Speech Summarization Using Large Language Model
by: Shang, Hengchao, et al.
Published: (2024)
by: Shang, Hengchao, et al.
Published: (2024)
On Improving Error Resilience of Neural End-to-End Speech Coders
by: Gupta, Kishan, et al.
Published: (2024)
by: Gupta, Kishan, et al.
Published: (2024)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)
by: Zhou, Junzuo, et al.
Published: (2024)
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Ahmad, Hawraz A., et al.
Published: (2024)
by: Ahmad, Hawraz A., et al.
Published: (2024)
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering
by: Hu, Jiliang, et al.
Published: (2025)
by: Hu, Jiliang, et al.
Published: (2025)
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
by: Farhadipour, Aref, et al.
Published: (2023)
by: Farhadipour, Aref, et al.
Published: (2023)
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
by: Li, Tianpeng, et al.
Published: (2025)
by: Li, Tianpeng, et al.
Published: (2025)
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
by: Raimon, Athul, et al.
Published: (2024)
by: Raimon, Athul, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving
by: Wang, Jiawei, et al.
Published: (2025)
by: Wang, Jiawei, et al.
Published: (2025)
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)
by: Guo, Yinlin, et al.
Published: (2024)
ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer
by: Xie, Tong, et al.
Published: (2024)
by: Xie, Tong, et al.
Published: (2024)
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
by: Kocour, Martin, et al.
Published: (2025)
by: Kocour, Martin, et al.
Published: (2025)
Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech Translation
by: Moslem, Yasmin
Published: (2024)
by: Moslem, Yasmin
Published: (2024)
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
by: Guimarães, Heitor R., et al.
Published: (2024)
by: Guimarães, Heitor R., et al.
Published: (2024)
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
by: Cheng, Shanbo, et al.
Published: (2024)
by: Cheng, Shanbo, et al.
Published: (2024)
SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
by: Lin, Chyi-Jiunn, et al.
Published: (2024)
by: Lin, Chyi-Jiunn, et al.
Published: (2024)
Code-Switching in End-to-End Automatic Speech Recognition: A Systematic Literature Review
by: Agro, Maha Tufail, et al.
Published: (2025)
by: Agro, Maha Tufail, et al.
Published: (2025)
Similar Items
-
ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech
by: Lou, Haowei, et al.
Published: (2026) -
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025) -
ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation
by: Lou, Haowei, et al.
Published: (2025) -
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
by: Lou, Haowei, et al.
Published: (2024) -
Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration
by: Lou, Haowei, et al.
Published: (2024)