Saved in:
| Main Authors: | Oh, Hyunseok, Yi, Juheon, Lee, Youngki |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.00888 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024)
by: Liu, Andy T., et al.
Published: (2024)
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching
by: Yang, Jinhyeok, et al.
Published: (2026)
by: Yang, Jinhyeok, et al.
Published: (2026)
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
by: Vieting, Peter, et al.
Published: (2023)
by: Vieting, Peter, et al.
Published: (2023)
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)
by: Karapiperis, Sotirios, et al.
Published: (2024)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
Length-Aware Rotary Position Embedding for Text-Speech Alignment
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)
by: Li, Yingting, et al.
Published: (2024)
Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
Phir Hera Fairy: An English Fairytaler is a Strong Faker of Fluent Speech in Low-Resource Indian Languages
by: Varadhan, Praveen Srinivasa, et al.
Published: (2025)
by: Varadhan, Praveen Srinivasa, et al.
Published: (2025)
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)
by: Duret, Jarod, et al.
Published: (2024)
SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
by: Wang, Pu, et al.
Published: (2026)
by: Wang, Pu, et al.
Published: (2026)
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
by: Yoo, Suho, et al.
Published: (2025)
by: Yoo, Suho, et al.
Published: (2025)
SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)
by: Agranovich, Alex, et al.
Published: (2024)
Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)
by: Nachmani, Eliya, et al.
Published: (2023)
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
by: Shah, Muhammad A., et al.
Published: (2024)
by: Shah, Muhammad A., et al.
Published: (2024)
Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning
by: Nagpal, Chirag, et al.
Published: (2024)
by: Nagpal, Chirag, et al.
Published: (2024)
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2022)
by: Lin, Tzu-Quan, et al.
Published: (2022)
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
by: Kim, Minchan, et al.
Published: (2024)
by: Kim, Minchan, et al.
Published: (2024)
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
by: Wang, Xiaofei, et al.
Published: (2023)
by: Wang, Xiaofei, et al.
Published: (2023)
Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services
by: Venkateshperumal, Danush, et al.
Published: (2024)
by: Venkateshperumal, Danush, et al.
Published: (2024)
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
by: Ok, Hyunjong, et al.
Published: (2025)
by: Ok, Hyunjong, et al.
Published: (2025)
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
by: Shankar, Bhavani, et al.
Published: (2024)
by: Shankar, Bhavani, et al.
Published: (2024)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)
by: Rossenbach, Nick, et al.
Published: (2024)
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
by: Ljubešić, Nikola, et al.
Published: (2024)
by: Ljubešić, Nikola, et al.
Published: (2024)
Modeling Overlapped Speech with Shuffles
by: Wiesner, Matthew, et al.
Published: (2026)
by: Wiesner, Matthew, et al.
Published: (2026)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)
by: Lin, Hsi-Che, et al.
Published: (2024)
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
by: Wang, Siyin, et al.
Published: (2024)
by: Wang, Siyin, et al.
Published: (2024)
TTSDS -- Text-to-Speech Distribution Score
by: Minixhofer, Christoph, et al.
Published: (2024)
by: Minixhofer, Christoph, et al.
Published: (2024)
Textually Pretrained Speech Language Models
by: Hassid, Michael, et al.
Published: (2023)
by: Hassid, Michael, et al.
Published: (2023)
PAST: Phonetic-Acoustic Speech Tokenizer
by: Har-Tuv, Nadav, et al.
Published: (2025)
by: Har-Tuv, Nadav, et al.
Published: (2025)
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Coupling Speech Encoders with Downstream Text Models
by: Chelba, Ciprian, et al.
Published: (2024)
by: Chelba, Ciprian, et al.
Published: (2024)
Text to Speech System for Meitei Mayek Script
by: Irengbam, Gangular Singh, et al.
Published: (2025)
by: Irengbam, Gangular Singh, et al.
Published: (2025)
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)
by: Liu, Alexander H., et al.
Published: (2023)
Adapting Language Balance in Code-Switching Speech
by: Ugan, Enes Yavuz, et al.
Published: (2025)
by: Ugan, Enes Yavuz, et al.
Published: (2025)
FlashSpeech: Efficient Zero-Shot Speech Synthesis
by: Ye, Zhen, et al.
Published: (2024)
by: Ye, Zhen, et al.
Published: (2024)
Energy-Based Models with Applications to Speech and Language Processing
by: Ou, Zhijian
Published: (2024)
by: Ou, Zhijian
Published: (2024)
Disentangling Textual and Acoustic Features of Neural Speech Representations
by: Mohebbi, Hosein, et al.
Published: (2024)
by: Mohebbi, Hosein, et al.
Published: (2024)
Moonshine: Speech Recognition for Live Transcription and Voice Commands
by: Jeffries, Nat, et al.
Published: (2024)
by: Jeffries, Nat, et al.
Published: (2024)
Similar Items
-
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
by: Liu, Andy T., et al.
Published: (2024) -
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025) -
RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching
by: Yang, Jinhyeok, et al.
Published: (2026) -
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
by: Vieting, Peter, et al.
Published: (2023) -
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)