:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pritzen, Julia, Gref, Michael, Zühlke, Dietlind, Schmidt, Christoph
Format:	Preprint
Published:	2021
Subjects:	Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2105.12708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech
by: Kolani, Yakov, et al.
Published: (2025)

Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning
by: Ohnaka, Hien, et al.
Published: (2025)

DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025)

Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
by: Yeo, Eunjung, et al.
Published: (2026)

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
by: Dong, Lukuan, et al.
Published: (2024)

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
by: Poli, Maxime, et al.
Published: (2026)

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)

Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)

GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
by: Zhang, Harry, et al.
Published: (2025)

PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
by: Yang, Runyan, et al.
Published: (2024)

Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
by: Gállego, Gerard I., et al.
Published: (2025)

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
by: Yamauchi, Kazuki, et al.
Published: (2024)

Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
by: Peng, Yifan, et al.
Published: (2024)

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
by: Anh, Tran Nguyen, et al.
Published: (2025)

JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis
by: Cha, Jun-Hyeok, et al.
Published: (2025)

Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges
by: Cornell, Samuele, et al.
Published: (2025)

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
by: Hu, Jiliang, et al.
Published: (2025)

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
by: Karapiperis, Sotirios, et al.
Published: (2024)

Weight Factorization and Centralization for Continual Learning in Speech Recognition
by: Ugan, Enes Yavuz, et al.
Published: (2025)

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)

Efficient Compression of Multitask Multilingual Speech Models
by: Ferraz, Thomas Palmeira
Published: (2024)

Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
by: Yu, Fangxu, et al.
Published: (2024)

Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
by: McGuire, Michael
Published: (2025)

Automatic Speech Recognition for Hindi
by: Saha, Anish, et al.
Published: (2024)

Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)

Multilingual Stutter Event Detection for English, German, and Mandarin Speech
by: Haas, Felix, et al.
Published: (2026)

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)

Speech Recognition Rescoring with Large Speech-Text Foundation Models
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)

Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022)

A Deep Learning Automatic Speech Recognition Model for Shona Language
by: Sirora, Leslie Wellington, et al.
Published: (2025)

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
by: Wotherspoon, Shannon, et al.
Published: (2024)

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
by: Poli, Maxime, et al.
Published: (2024)

Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)

Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition
by: Vesterbacka, Leonora, et al.
Published: (2025)

Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
by: Lee, Jeehyun, et al.
Published: (2024)