Saved in:
| Main Authors: | Despotovic, Vladimir, Pocta, Peter, Zgank, Andrej |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.04533 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models
by: Gubian, Michele, et al.
Published: (2025)
by: Gubian, Michele, et al.
Published: (2025)
Curriculum learning for self-supervised speaker verification
by: Heo, Hee-Soo, et al.
Published: (2022)
by: Heo, Hee-Soo, et al.
Published: (2022)
Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025)
by: Takano, Taisei, et al.
Published: (2025)
Improving acoustic drone detection generalization through pretraining and data augmentation
by: Reuter, Paul M., et al.
Published: (2026)
by: Reuter, Paul M., et al.
Published: (2026)
Investigating self-supervised features for expressive, multilingual voice conversion
by: Martín-Cortinas, Álvaro, et al.
Published: (2025)
by: Martín-Cortinas, Álvaro, et al.
Published: (2025)
Online incremental learning for audio classification using a pretrained audio model
by: Mulimani, Manjunath, et al.
Published: (2025)
by: Mulimani, Manjunath, et al.
Published: (2025)
Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets
by: Wihlborg, Lovisa, et al.
Published: (2025)
by: Wihlborg, Lovisa, et al.
Published: (2025)
Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling
by: Gao, Wenmiao, et al.
Published: (2025)
by: Gao, Wenmiao, et al.
Published: (2025)
Positive and negative sampling strategies for self-supervised learning on audio-video data
by: Wang, Shanshan, et al.
Published: (2024)
by: Wang, Shanshan, et al.
Published: (2024)
Fine-tune the pretrained ATST model for sound event detection
by: Shao, Nian, et al.
Published: (2023)
by: Shao, Nian, et al.
Published: (2023)
On the social bias of speech self-supervised models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Towards generalisable and calibrated synthetic speech detection with self-supervised representations
by: Pascu, Octavian, et al.
Published: (2023)
by: Pascu, Octavian, et al.
Published: (2023)
From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model
by: Lavechin, Marvin, et al.
Published: (2025)
by: Lavechin, Marvin, et al.
Published: (2025)
The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models
by: Wang, Yi, et al.
Published: (2025)
by: Wang, Yi, et al.
Published: (2025)
Encoding of lexical tone in self-supervised models of spoken language
by: Shen, Gaofei, et al.
Published: (2024)
by: Shen, Gaofei, et al.
Published: (2024)
Tempo estimation as fully self-supervised binary classification
by: Henkel, Florian, et al.
Published: (2024)
by: Henkel, Florian, et al.
Published: (2024)
LinearVC: Linear transformations of self-supervised features through the lens of voice conversion
by: Kamper, Herman, et al.
Published: (2025)
by: Kamper, Herman, et al.
Published: (2025)
GLAP: General contrastive audio-text pretraining across domains and languages
by: Dinkel, Heinrich, et al.
Published: (2025)
by: Dinkel, Heinrich, et al.
Published: (2025)
Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording
by: Wang, Bo, et al.
Published: (2024)
by: Wang, Bo, et al.
Published: (2024)
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
by: Li, Yue, et al.
Published: (2024)
by: Li, Yue, et al.
Published: (2024)
Automated data curation for self-supervised learning in underwater acoustic analysis
by: Hummel, Hilde I, et al.
Published: (2025)
by: Hummel, Hilde I, et al.
Published: (2025)
Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)
by: Chen, Szu-Jui, et al.
Published: (2026)
Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)
by: Tsunoo, Emiru, et al.
Published: (2024)
Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer
by: Shechtman, Slava, et al.
Published: (2024)
by: Shechtman, Slava, et al.
Published: (2024)
Impairments are Clustered in Latents of Deep Neural Network-based Speech Quality Models
by: Cumlin, Fredrik, et al.
Published: (2025)
by: Cumlin, Fredrik, et al.
Published: (2025)
A study on weakly-supervised training approaches for phoneme-level pronunciation scoring
by: Vidal, Jazmín, et al.
Published: (2026)
by: Vidal, Jazmín, et al.
Published: (2026)
Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
by: Huang, Wanting, et al.
Published: (2026)
by: Huang, Wanting, et al.
Published: (2026)
Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter
by: Xi, Yu, et al.
Published: (2024)
by: Xi, Yu, et al.
Published: (2024)
Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)
by: Hummel, Hilde I., et al.
Published: (2026)
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)
by: Yadav, Sarthak, et al.
Published: (2025)
AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024)
by: Alabi, Jesujoba O., et al.
Published: (2024)
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
by: Maharana, Sarthak Kumar, et al.
Published: (2023)
by: Maharana, Sarthak Kumar, et al.
Published: (2023)
STONE: Self-supervised Tonality Estimator
by: Kong, Yuexuan, et al.
Published: (2024)
by: Kong, Yuexuan, et al.
Published: (2024)
emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)
by: Gowda, Harshavardhana T., et al.
Published: (2025)
Equivariance-based self-supervised learning for audio signal recovery from clipped measurements
by: Sechaud, Victor, et al.
Published: (2024)
by: Sechaud, Victor, et al.
Published: (2024)
Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts
by: Kuhlmann, Michael, et al.
Published: (2026)
by: Kuhlmann, Michael, et al.
Published: (2026)
Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation
by: Wang, Zhong-Qiu
Published: (2024)
by: Wang, Zhong-Qiu
Published: (2024)
Tracking the emergence of linguistic structure in self-supervised models learning from speech
by: Kloots, Marianne de Heer, et al.
Published: (2026)
by: Kloots, Marianne de Heer, et al.
Published: (2026)
Universal Preference-Score-based Pairwise Speech Quality Assessment
by: Shi, Yu-Fei, et al.
Published: (2025)
by: Shi, Yu-Fei, et al.
Published: (2025)
Similar Items
-
Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models
by: Gubian, Michele, et al.
Published: (2025) -
Curriculum learning for self-supervised speaker verification
by: Heo, Hee-Soo, et al.
Published: (2022) -
Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025) -
Improving acoustic drone detection generalization through pretraining and data augmentation
by: Reuter, Paul M., et al.
Published: (2026) -
Investigating self-supervised features for expressive, multilingual voice conversion
by: Martín-Cortinas, Álvaro, et al.
Published: (2025)