Saved in:
| Main Authors: | Pritzen, Julia, Gref, Michael, Zühlke, Dietlind, Schmidt, Christoph |
|---|---|
| Format: | Preprint |
| Published: |
2021
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2105.12708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech
by: Kolani, Yakov, et al.
Published: (2025)
by: Kolani, Yakov, et al.
Published: (2025)
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
by: Ohnaka, Hien, et al.
Published: (2025)
by: Ohnaka, Hien, et al.
Published: (2025)
DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025)
by: Lee, Wonjun, et al.
Published: (2025)
Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
by: Yeo, Eunjung, et al.
Published: (2026)
by: Yeo, Eunjung, et al.
Published: (2026)
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
by: Dong, Lukuan, et al.
Published: (2024)
by: Dong, Lukuan, et al.
Published: (2024)
DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
by: Poli, Maxime, et al.
Published: (2026)
by: Poli, Maxime, et al.
Published: (2026)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)
by: Wei, Kun, et al.
Published: (2023)
Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)
by: Gedeon, Máté, et al.
Published: (2026)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
by: Zhang, Harry, et al.
Published: (2025)
by: Zhang, Harry, et al.
Published: (2025)
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
by: Yang, Runyan, et al.
Published: (2024)
by: Yang, Runyan, et al.
Published: (2024)
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
by: Gállego, Gerard I., et al.
Published: (2025)
by: Gállego, Gerard I., et al.
Published: (2025)
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
by: Yamauchi, Kazuki, et al.
Published: (2024)
by: Yamauchi, Kazuki, et al.
Published: (2024)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)
by: Zhang, Wenyu, et al.
Published: (2025)
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
by: Anh, Tran Nguyen, et al.
Published: (2025)
by: Anh, Tran Nguyen, et al.
Published: (2025)
JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis
by: Cha, Jun-Hyeok, et al.
Published: (2025)
by: Cha, Jun-Hyeok, et al.
Published: (2025)
Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges
by: Cornell, Samuele, et al.
Published: (2025)
by: Cornell, Samuele, et al.
Published: (2025)
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
by: Hu, Jiliang, et al.
Published: (2025)
by: Hu, Jiliang, et al.
Published: (2025)
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)
by: Li, Chin-Jou, et al.
Published: (2025)
Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)
by: Azad, Asif, et al.
Published: (2026)
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)
by: Karapiperis, Sotirios, et al.
Published: (2024)
Weight Factorization and Centralization for Continual Learning in Speech Recognition
by: Ugan, Enes Yavuz, et al.
Published: (2025)
by: Ugan, Enes Yavuz, et al.
Published: (2025)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Efficient Compression of Multitask Multilingual Speech Models
by: Ferraz, Thomas Palmeira
Published: (2024)
by: Ferraz, Thomas Palmeira
Published: (2024)
Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
by: Yu, Fangxu, et al.
Published: (2024)
by: Yu, Fangxu, et al.
Published: (2024)
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
by: McGuire, Michael
Published: (2025)
by: McGuire, Michael
Published: (2025)
Automatic Speech Recognition for Hindi
by: Saha, Anish, et al.
Published: (2024)
by: Saha, Anish, et al.
Published: (2024)
Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Multilingual Stutter Event Detection for English, German, and Mandarin Speech
by: Haas, Felix, et al.
Published: (2026)
by: Haas, Felix, et al.
Published: (2026)
A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)
by: Saengthong, Phurich, et al.
Published: (2025)
Speech Recognition Rescoring with Large Speech-Text Foundation Models
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)
Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022)
by: Wu, Wen, et al.
Published: (2022)
A Deep Learning Automatic Speech Recognition Model for Shona Language
by: Sirora, Leslie Wellington, et al.
Published: (2025)
by: Sirora, Leslie Wellington, et al.
Published: (2025)
Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
by: Wotherspoon, Shannon, et al.
Published: (2024)
by: Wotherspoon, Shannon, et al.
Published: (2024)
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
by: Poli, Maxime, et al.
Published: (2024)
by: Poli, Maxime, et al.
Published: (2024)
Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition
by: Vesterbacka, Leonora, et al.
Published: (2025)
by: Vesterbacka, Leonora, et al.
Published: (2025)
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
by: Lee, Jeehyun, et al.
Published: (2024)
by: Lee, Jeehyun, et al.
Published: (2024)
Similar Items
-
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech
by: Kolani, Yakov, et al.
Published: (2025) -
Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
by: Ohnaka, Hien, et al.
Published: (2025) -
DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025) -
Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
by: Yeo, Eunjung, et al.
Published: (2026) -
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
by: Dong, Lukuan, et al.
Published: (2024)