Saved in:
| Main Authors: | Khadse, Parth, Kopparapu, Sunil Kumar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.14664 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)
by: Kopparapu, Sunil Kumar
Published: (2026)
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
by: Kopparapu, Sunil Kumar, et al.
Published: (2024)
by: Kopparapu, Sunil Kumar, et al.
Published: (2024)
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)
by: Guo, Yinlin, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
by: Tiwari, Upasana, et al.
Published: (2025)
by: Tiwari, Upasana, et al.
Published: (2025)
End-to-End Speech-to-Text Translation: A Survey
by: Sethiya, Nivedita, et al.
Published: (2023)
by: Sethiya, Nivedita, et al.
Published: (2023)
Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition
by: Tiwari, Upasana, et al.
Published: (2025)
by: Tiwari, Upasana, et al.
Published: (2025)
An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)
by: Vadapalli, Anandaswarup
Published: (2023)
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Ahmad, Hawraz A., et al.
Published: (2024)
by: Ahmad, Hawraz A., et al.
Published: (2024)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
SpeechAgent: An End-to-End Mobile Infrastructure for Speech Impairment Assistance
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
by: Min, Anna, et al.
Published: (2025)
by: Min, Anna, et al.
Published: (2025)
Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
by: Wang, Jialing, et al.
Published: (2026)
by: Wang, Jialing, et al.
Published: (2026)
SAND Challenge: Four Approaches for Dysartria Severity Classification
by: Deshpande, Gauri, et al.
Published: (2025)
by: Deshpande, Gauri, et al.
Published: (2025)
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
by: Vendrame, Katia, et al.
Published: (2025)
by: Vendrame, Katia, et al.
Published: (2025)
Representation Purification for End-to-End Speech Translation
by: Zhang, Chengwei, et al.
Published: (2024)
by: Zhang, Chengwei, et al.
Published: (2024)
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)
by: Lu, Wenhuan, et al.
Published: (2025)
Deep Speech Synthesis from Multimodal Articulatory Representations
by: Wu, Peter, et al.
Published: (2024)
by: Wu, Peter, et al.
Published: (2024)
Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
by: Zhang, Ziqian, et al.
Published: (2025)
by: Zhang, Ziqian, et al.
Published: (2025)
Improved Dysarthric Speech to Text Conversion via TTS Personalization
by: Mihajlik, Péter, et al.
Published: (2025)
by: Mihajlik, Péter, et al.
Published: (2025)
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)
by: Liu, Huadai, et al.
Published: (2023)
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)
by: Qi, Xin, et al.
Published: (2024)
DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
by: Yin, Kang, et al.
Published: (2025)
by: Yin, Kang, et al.
Published: (2025)
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
An End-to-End Speech Summarization Using Large Language Model
by: Shang, Hengchao, et al.
Published: (2024)
by: Shang, Hengchao, et al.
Published: (2024)
On Improving Error Resilience of Neural End-to-End Speech Coders
by: Gupta, Kishan, et al.
Published: (2024)
by: Gupta, Kishan, et al.
Published: (2024)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)
by: Zhou, Junzuo, et al.
Published: (2024)
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)
by: Liu, Xiaoqian, et al.
Published: (2024)
OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
by: Ren, Yong, et al.
Published: (2026)
by: Ren, Yong, et al.
Published: (2026)
TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
by: Liang, Qifan, et al.
Published: (2026)
by: Liang, Qifan, et al.
Published: (2026)
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)
by: Weise, Tobias, et al.
Published: (2024)
Pushing the Limits of End-to-End Diarization
by: Broughton, Samuel J., et al.
Published: (2025)
by: Broughton, Samuel J., et al.
Published: (2025)
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
by: Li, Tianpeng, et al.
Published: (2025)
by: Li, Tianpeng, et al.
Published: (2025)
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
by: Raimon, Athul, et al.
Published: (2024)
by: Raimon, Athul, et al.
Published: (2024)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
LoRP-TTS: Low-Rank Personalized Text-To-Speech
by: Bondaruk, Łukasz, et al.
Published: (2025)
by: Bondaruk, Łukasz, et al.
Published: (2025)
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)
by: Ma, Zhengrui, et al.
Published: (2024)
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
by: Guimarães, Heitor R., et al.
Published: (2024)
by: Guimarães, Heitor R., et al.
Published: (2024)
Similar Items
-
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026) -
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
by: Kopparapu, Sunil Kumar, et al.
Published: (2024) -
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024) -
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025) -
Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
by: Tiwari, Upasana, et al.
Published: (2025)