:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	van Rensburg, Kyle Janse, van Niekerk, Benjamin, Kamper, Herman
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2603.03096
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Revisiting speech segmentation and lexicon learning with better features
by: Kamper, Herman, et al.
Published: (2024)

Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
by: Malan, Simon, et al.
Published: (2024)

Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
by: Malan, Simon, et al.
Published: (2025)

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025)

LinearVC: Linear transformations of self-supervised features through the lens of voice conversion
by: Kamper, Herman, et al.
Published: (2025)

Spoken Language Modeling with Duration-Penalized Self-Supervised Units
by: Visser, Nicol, et al.
Published: (2025)

Spoken-Term Discovery using Discrete Speech Units
by: van Niekerk, Benjamin, et al.
Published: (2024)

Disentanglement in a GAN for Unconditional Speech Synthesis
by: Baas, Matthew, et al.
Published: (2023)

Visually Grounded Speech Models have a Mutual Exclusivity Bias
by: Nortje, Leanne, et al.
Published: (2024)

Translating speech with just images
by: Oneata, Dan, et al.
Published: (2024)

Visually grounded few-shot word learning in low-resource settings
by: Nortje, Leanne, et al.
Published: (2023)

Towards few-shot isolated word reading assessment
by: Smit, Reuben, et al.
Published: (2025)

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model
by: Baas, Matthew, et al.
Published: (2025)

Unsupervised lexicon learning from speech is limited by representations rather than clustering
by: Slabbert, Danel, et al.
Published: (2025)

Feature-based analysis of oral narratives from Afrikaans and isiXhosa children
by: Sharratt, Emma, et al.
Published: (2025)

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling
by: Visser, Nicol, et al.
Published: (2026)

The mutual exclusivity bias of bilingual visually grounded speech models
by: Oneata, Dan, et al.
Published: (2025)

Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
by: Jacobs, Christiaan, et al.
Published: (2025)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)

Linear-Complexity Self-Supervised Learning for Speech Processing
by: Zhang, Shucong, et al.
Published: (2024)

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)

Automatically assessing oral narratives of Afrikaans and isiXhosa children
by: Louw, Retief, et al.
Published: (2025)

DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)

Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)

Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
by: Nortje, Leanne, et al.
Published: (2024)

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
by: Hwang, Min-Jae, et al.
Published: (2024)

Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
by: Komatsu, Ryota, et al.
Published: (2024)

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
by: Ashihara, Takanori, et al.
Published: (2024)

Self-Supervised Models of Speech Infer Universal Articulatory Kinematics
by: Cho, Cheol Jun, et al.
Published: (2023)

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)

Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes
by: van Dalen, Rogier C., et al.
Published: (2025)

Interface Design for Self-Supervised Speech Models
by: Shih, Yi-Jen, et al.
Published: (2024)

Optimizing Speech-Input Length for Speaker-Independent Depression Classification
by: Rutowski, Tomasz, et al.
Published: (2024)

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
by: Wang, Yujin, et al.
Published: (2022)

Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
by: Blandón, María Andrea Cruz, et al.
Published: (2025)

Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)

Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)

DiariST: Streaming Speech Translation with Speaker Diarization
by: Yang, Mu, et al.
Published: (2023)

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
by: Peng, Junyi, et al.
Published: (2025)

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)