:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Oh, Hyunseok, Yi, Juheon, Lee, Youngki
Format:	Preprint
Published:	2024
Subjects:	Sound Computation and Language Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.00888
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024)

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)

RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching
by: Yang, Jinhyeok, et al.
Published: (2026)

Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
by: Vieting, Peter, et al.
Published: (2023)

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

Length-Aware Rotary Position Embedding for Text-Speech Alignment
by: Kim, Hyeongju, et al.
Published: (2025)

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)

Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024)

Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages
by: Varadhan, Praveen Srinivasa, et al.
Published: (2025)

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
by: Wang, Pu, et al.
Published: (2026)

Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
by: Yoo, Suho, et al.
Published: (2025)

SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)

Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)

Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning
by: Nagpal, Chirag, et al.
Published: (2024)

Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2022)

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
by: Kim, Minchan, et al.
Published: (2024)

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services
by: Venkateshperumal, Danush, et al.
Published: (2024)

Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
by: Ok, Hyunjong, et al.
Published: (2025)

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
by: Shankar, Bhavani, et al.
Published: (2024)

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)

The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
by: Ljubešić, Nikola, et al.
Published: (2024)

Modeling Overlapped Speech with Shuffles
by: Wiesner, Matthew, et al.
Published: (2026)

Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
by: Wang, Siyin, et al.
Published: (2024)

TTSDS -- Text-to-Speech Distribution Score
by: Minixhofer, Christoph, et al.
Published: (2024)

Textually Pretrained Speech Language Models
by: Hassid, Michael, et al.
Published: (2023)

PAST: Phonetic-Acoustic Speech Tokenizer
by: Har-Tuv, Nadav, et al.
Published: (2025)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)

Coupling Speech Encoders with Downstream Text Models
by: Chelba, Ciprian, et al.
Published: (2024)

Text to Speech System for Meitei Mayek Script
by: Irengbam, Gangular Singh, et al.
Published: (2025)

Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)

Adapting Language Balance in Code-Switching Speech
by: Ugan, Enes Yavuz, et al.
Published: (2025)

FlashSpeech: Efficient Zero-Shot Speech Synthesis
by: Ye, Zhen, et al.
Published: (2024)

Energy-Based Models with Applications to Speech and Language Processing
by: Ou, Zhijian
Published: (2024)

Disentangling Textual and Acoustic Features of Neural Speech Representations
by: Mohebbi, Hosein, et al.
Published: (2024)

Moonshine: Speech Recognition for Live Transcription and Voice Commands
by: Jeffries, Nat, et al.
Published: (2024)