:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Törö, Tuukka, Suni, Antti, Šimko, Juraj
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2506.08564
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond the binary: Limitations and possibilities of gender-related speech technology research
by: Sanchez, Ariadna, et al.
Published: (2024)

emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

Convoifilter: A case study of doing cocktail party speech recognition
by: Nguyen, Thai-Binh, et al.
Published: (2023)

Translating speech with just images
by: Oneata, Dan, et al.
Published: (2024)

Can large audio language models understand child stuttering speech? speech summarization, and source separation
by: Okocha, Chibuzor, et al.
Published: (2025)

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
by: Wang, Hsuan-Fu, et al.
Published: (2024)

A Theoretical Framework for Acoustic Neighbor Embeddings
by: Jeon, Woojay
Published: (2024)

Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)

Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)

Semantic enrichment towards efficient speech representations
by: Laperrière, Gaëlle, et al.
Published: (2023)

Can Whisper perform speech-based in-context learning?
by: Wang, Siyin, et al.
Published: (2023)

Revisiting speech segmentation and lexicon learning with better features
by: Kamper, Herman, et al.
Published: (2024)

LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)

Self-consistent context aware conformer transducer for speech recognition
by: Kolokolov, Konstantin, et al.
Published: (2024)

An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)

Direct Punjabi to English speech translation using discrete units
by: Kaur, Prabhjot, et al.
Published: (2024)

Transferable speech-to-text large language model alignment module
by: Wu, Boyong, et al.
Published: (2024)

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection
by: Cui, Ziyun, et al.
Published: (2023)

Natural language guidance of high-fidelity text-to-speech with synthetic annotations
by: Lyth, Dan, et al.
Published: (2024)

An experiment on an automated literature survey of data-driven speech enhancement methods
by: Santos, Arthur dos, et al.
Published: (2023)

Unsupervised lexicon learning from speech is limited by representations rather than clustering
by: Slabbert, Danel, et al.
Published: (2025)

Exploring the limits of decoder-only models trained on public speech recognition corpora
by: Gupta, Ankit, et al.
Published: (2024)

asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026)

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
by: Garg, Abhinav, et al.
Published: (2024)

Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)

Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
by: Dhakal, Manish, et al.
Published: (2024)

Korean aegyo speech shows systematic F1 increase to signal childlike qualities
by: Kim, Ji-eun, et al.
Published: (2026)

AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024)

Multilingual acoustic word embeddings for zero-resource languages
by: Jacobs, Christiaan
Published: (2024)

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)

High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
by: Banerjee, Sourav, et al.
Published: (2024)

Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)

Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
by: Cheng, Shanbo, et al.
Published: (2025)

PRODIS -- a speech database and a phoneme-based language model for the study of predictability effects in Polish
by: Malisz, Zofia, et al.
Published: (2024)

Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)

Exploring the anatomy of articulation rate in spontaneous English speech: relationships between utterance length effects and social factors
by: Tanner, James, et al.
Published: (2024)

A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding
by: Laperrière, Gaëlle, et al.
Published: (2024)