:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khadse, Parth, Kopparapu, Sunil Kumar
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2602.14664
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)

A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
by: Kopparapu, Sunil Kumar, et al.
Published: (2024)

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
by: Tiwari, Upasana, et al.
Published: (2025)

End-to-End Speech-to-Text Translation: A Survey
by: Sethiya, Nivedita, et al.
Published: (2023)

Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition
by: Tiwari, Upasana, et al.
Published: (2025)

An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Ahmad, Hawraz A., et al.
Published: (2024)

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)

SpeechAgent: An End-to-End Mobile Infrastructure for Speech Impairment Assistance
by: Lou, Haowei, et al.
Published: (2025)

When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
by: Min, Anna, et al.
Published: (2025)

Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
by: Wang, Jialing, et al.
Published: (2026)

SAND Challenge: Four Approaches for Dysartria Severity Classification
by: Deshpande, Gauri, et al.
Published: (2025)

AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
by: Huang, Wuwei, et al.
Published: (2025)

Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
by: Vendrame, Katia, et al.
Published: (2025)

Representation Purification for End-to-End Speech Translation
by: Zhang, Chengwei, et al.
Published: (2024)

Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
by: Lu, Wenhuan, et al.
Published: (2025)

Deep Speech Synthesis from Multimodal Articulatory Representations
by: Wu, Peter, et al.
Published: (2024)

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
by: Zhang, Ziqian, et al.
Published: (2025)

Improved Dysarthric Speech to Text Conversion via TTS Personalization
by: Mihajlik, Péter, et al.
Published: (2025)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)

DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
by: Yin, Kang, et al.
Published: (2025)

MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)

An End-to-End Speech Summarization Using Large Language Model
by: Shang, Hengchao, et al.
Published: (2024)

On Improving Error Resilience of Neural End-to-End Speech Coders
by: Gupta, Kishan, et al.
Published: (2024)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification
by: Zhou, Junzuo, et al.
Published: (2024)

Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
by: Ren, Yong, et al.
Published: (2026)

TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
by: Liang, Qifan, et al.
Published: (2026)

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)

Pushing the Limits of End-to-End Diarization
by: Broughton, Samuel J., et al.
Published: (2025)

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
by: Li, Tianpeng, et al.
Published: (2025)

Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
by: Raimon, Athul, et al.
Published: (2024)

Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)

LoRP-TTS: Low-Rank Personalized Text-To-Speech
by: Bondaruk, Łukasz, et al.
Published: (2025)

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
by: Guimarães, Heitor R., et al.
Published: (2024)