Saved in:
| Main Authors: | van Rensburg, Kyle Janse, van Niekerk, Benjamin, Kamper, Herman |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03096 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting speech segmentation and lexicon learning with better features
by: Kamper, Herman, et al.
Published: (2024)
by: Kamper, Herman, et al.
Published: (2024)
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
by: Malan, Simon, et al.
Published: (2024)
by: Malan, Simon, et al.
Published: (2024)
Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
by: Malan, Simon, et al.
Published: (2025)
by: Malan, Simon, et al.
Published: (2025)
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025)
by: Carbonneau, Marc-André, et al.
Published: (2025)
LinearVC: Linear transformations of self-supervised features through the lens of voice conversion
by: Kamper, Herman, et al.
Published: (2025)
by: Kamper, Herman, et al.
Published: (2025)
Spoken Language Modeling with Duration-Penalized Self-Supervised Units
by: Visser, Nicol, et al.
Published: (2025)
by: Visser, Nicol, et al.
Published: (2025)
Spoken-Term Discovery using Discrete Speech Units
by: van Niekerk, Benjamin, et al.
Published: (2024)
by: van Niekerk, Benjamin, et al.
Published: (2024)
Disentanglement in a GAN for Unconditional Speech Synthesis
by: Baas, Matthew, et al.
Published: (2023)
by: Baas, Matthew, et al.
Published: (2023)
Visually Grounded Speech Models have a Mutual Exclusivity Bias
by: Nortje, Leanne, et al.
Published: (2024)
by: Nortje, Leanne, et al.
Published: (2024)
Translating speech with just images
by: Oneata, Dan, et al.
Published: (2024)
by: Oneata, Dan, et al.
Published: (2024)
Visually grounded few-shot word learning in low-resource settings
by: Nortje, Leanne, et al.
Published: (2023)
by: Nortje, Leanne, et al.
Published: (2023)
Towards few-shot isolated word reading assessment
by: Smit, Reuben, et al.
Published: (2025)
by: Smit, Reuben, et al.
Published: (2025)
MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model
by: Baas, Matthew, et al.
Published: (2025)
by: Baas, Matthew, et al.
Published: (2025)
Unsupervised lexicon learning from speech is limited by representations rather than clustering
by: Slabbert, Danel, et al.
Published: (2025)
by: Slabbert, Danel, et al.
Published: (2025)
Feature-based analysis of oral narratives from Afrikaans and isiXhosa children
by: Sharratt, Emma, et al.
Published: (2025)
by: Sharratt, Emma, et al.
Published: (2025)
ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling
by: Visser, Nicol, et al.
Published: (2026)
by: Visser, Nicol, et al.
Published: (2026)
The mutual exclusivity bias of bilingual visually grounded speech models
by: Oneata, Dan, et al.
Published: (2025)
by: Oneata, Dan, et al.
Published: (2025)
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
by: Jacobs, Christiaan, et al.
Published: (2025)
by: Jacobs, Christiaan, et al.
Published: (2025)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
Linear-Complexity Self-Supervised Learning for Speech Processing
by: Zhang, Shucong, et al.
Published: (2024)
by: Zhang, Shucong, et al.
Published: (2024)
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)
by: Kong, Jungil, et al.
Published: (2023)
Automatically assessing oral narratives of Afrikaans and isiXhosa children
by: Louw, Retief, et al.
Published: (2025)
by: Louw, Retief, et al.
Published: (2025)
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)
by: Wang, Kyra, et al.
Published: (2024)
Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)
by: Ashihara, Takanori, et al.
Published: (2024)
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
by: Nortje, Leanne, et al.
Published: (2024)
by: Nortje, Leanne, et al.
Published: (2024)
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
by: Hwang, Min-Jae, et al.
Published: (2024)
by: Hwang, Min-Jae, et al.
Published: (2024)
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
by: Komatsu, Ryota, et al.
Published: (2024)
by: Komatsu, Ryota, et al.
Published: (2024)
What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
by: Ashihara, Takanori, et al.
Published: (2024)
by: Ashihara, Takanori, et al.
Published: (2024)
Self-Supervised Models of Speech Infer Universal Articulatory Kinematics
by: Cho, Cheol Jun, et al.
Published: (2023)
by: Cho, Cheol Jun, et al.
Published: (2023)
Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)
by: Tseng, Wei-Cheng, et al.
Published: (2025)
Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes
by: van Dalen, Rogier C., et al.
Published: (2025)
by: van Dalen, Rogier C., et al.
Published: (2025)
Interface Design for Self-Supervised Speech Models
by: Shih, Yi-Jen, et al.
Published: (2024)
by: Shih, Yi-Jen, et al.
Published: (2024)
Optimizing Speech-Input Length for Speaker-Independent Depression Classification
by: Rutowski, Tomasz, et al.
Published: (2024)
by: Rutowski, Tomasz, et al.
Published: (2024)
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
by: Wang, Yujin, et al.
Published: (2022)
by: Wang, Yujin, et al.
Published: (2022)
Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
by: Blandón, María Andrea Cruz, et al.
Published: (2025)
by: Blandón, María Andrea Cruz, et al.
Published: (2025)
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
by: Sakuma, Asahi, et al.
Published: (2025)
by: Sakuma, Asahi, et al.
Published: (2025)
Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)
by: Gedeon, Máté, et al.
Published: (2026)
DiariST: Streaming Speech Translation with Speaker Diarization
by: Yang, Mu, et al.
Published: (2023)
by: Yang, Mu, et al.
Published: (2023)
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
by: Peng, Junyi, et al.
Published: (2025)
by: Peng, Junyi, et al.
Published: (2025)
STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)
by: Jang, Kangwook, et al.
Published: (2023)
Similar Items
-
Revisiting speech segmentation and lexicon learning with better features
by: Kamper, Herman, et al.
Published: (2024) -
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
by: Malan, Simon, et al.
Published: (2024) -
Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
by: Malan, Simon, et al.
Published: (2025) -
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025) -
LinearVC: Linear transformations of self-supervised features through the lens of voice conversion
by: Kamper, Herman, et al.
Published: (2025)