:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Despotovic, Vladimir, Pocta, Peter, Zgank, Andrej
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2511.04533
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models
by: Gubian, Michele, et al.
Published: (2025)

Curriculum learning for self-supervised speaker verification
by: Heo, Hee-Soo, et al.
Published: (2022)

Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025)

Improving acoustic drone detection generalization through pretraining and data augmentation
by: Reuter, Paul M., et al.
Published: (2026)

Investigating self-supervised features for expressive, multilingual voice conversion
by: Martín-Cortinas, Álvaro, et al.
Published: (2025)

Online incremental learning for audio classification using a pretrained audio model
by: Mulimani, Manjunath, et al.
Published: (2025)

Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets
by: Wihlborg, Lovisa, et al.
Published: (2025)

Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling
by: Gao, Wenmiao, et al.
Published: (2025)

Positive and negative sampling strategies for self-supervised learning on audio-video data
by: Wang, Shanshan, et al.
Published: (2024)

Fine-tune the pretrained ATST model for sound event detection
by: Shao, Nian, et al.
Published: (2023)

On the social bias of speech self-supervised models
by: Lin, Yi-Cheng, et al.
Published: (2024)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

Towards generalisable and calibrated synthetic speech detection with self-supervised representations
by: Pascu, Octavian, et al.
Published: (2023)

From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model
by: Lavechin, Marvin, et al.
Published: (2025)

The role of audio-visual integration in the time course of phonetic encoding in self-supervised speech models
by: Wang, Yi, et al.
Published: (2025)

Encoding of lexical tone in self-supervised models of spoken language
by: Shen, Gaofei, et al.
Published: (2024)

Tempo estimation as fully self-supervised binary classification
by: Henkel, Florian, et al.
Published: (2024)

LinearVC: Linear transformations of self-supervised features through the lens of voice conversion
by: Kamper, Herman, et al.
Published: (2025)

GLAP: General contrastive audio-text pretraining across domains and languages
by: Dinkel, Heinrich, et al.
Published: (2025)

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording
by: Wang, Bo, et al.
Published: (2024)

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
by: Li, Yue, et al.
Published: (2024)

Automated data curation for self-supervised learning in underwater acoustic analysis
by: Hummel, Hilde I, et al.
Published: (2025)

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)

Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer
by: Shechtman, Slava, et al.
Published: (2024)

Impairments are Clustered in Latents of Deep Neural Network-based Speech Quality Models
by: Cumlin, Fredrik, et al.
Published: (2025)

A study on weakly-supervised training approaches for phoneme-level pronunciation scoring
by: Vidal, Jazmín, et al.
Published: (2026)

Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
by: Huang, Wanting, et al.
Published: (2026)

Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter
by: Xi, Yu, et al.
Published: (2024)

Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
by: Yadav, Sarthak, et al.
Published: (2025)

AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024)

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
by: Maharana, Sarthak Kumar, et al.
Published: (2023)

STONE: Self-supervised Tonality Estimator
by: Kong, Yuexuan, et al.
Published: (2024)

emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)

Equivariance-based self-supervised learning for audio signal recovery from clipped measurements
by: Sechaud, Victor, et al.
Published: (2024)

Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts
by: Kuhlmann, Michael, et al.
Published: (2026)

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation
by: Wang, Zhong-Qiu
Published: (2024)

Tracking the emergence of linguistic structure in self-supervised models learning from speech
by: Kloots, Marianne de Heer, et al.
Published: (2026)

Universal Preference-Score-based Pairwise Speech Quality Assessment
by: Shi, Yu-Fei, et al.
Published: (2025)