:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sheikh, Shakeel, Marmaroli, Patrick, Sahidullah, MD, Ouni, Slim, Hirsch, Fabrice, Leal, Goncalo, Schuller, Bjorn W
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2605.01101
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Impact of Speech Mode in Automatic Pathological Speech Detection
by: Sheikh, Shakeel A., et al.
Published: (2024)

Multiview Canonical Correlation Analysis for Automatic Pathological Speech Detection
by: Kaloga, Yacouba, et al.
Published: (2024)

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
by: Kashyap, Bipasha, et al.
Published: (2026)

Abusive Speech Detection in Indic Languages Using Acoustic Features
by: Spiesberger, Anika A., et al.
Published: (2024)

Overview of Automatic Speech Analysis and Technologies for Neurodegenerative Disorders: Diagnosis and Assistive Applications
by: Sheikh, Shakeel A., et al.
Published: (2025)

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
by: Jing, Xin, et al.
Published: (2024)

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
by: Qi, Tianhua, et al.
Published: (2026)

Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers
by: Latif, Siddique, et al.
Published: (2023)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
by: Derington, Anna, et al.
Published: (2023)

Contrastive Knowledge Distillation for Embedding Refinement in Personalized Speech Enhancement
by: Serre, Thomas, et al.
Published: (2026)

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)

Direct Speech to Speech Translation: A Review
by: Sarim, Mohammad, et al.
Published: (2025)

Enhancing Speech Emotion Recognition Through Differentiable Architecture Search
by: Rajapakshe, Thejan, et al.
Published: (2023)

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
by: Zhao, Yan, et al.
Published: (2024)

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2022)

Selfsupervised learning for pathological speech detection
by: Sheikh, Shakeel Ahmad
Published: (2024)

Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2025)

Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
by: Schrüfer, Oliver, et al.
Published: (2024)

Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
by: Kounadis-Bastian, Dionyssos, et al.
Published: (2024)

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
by: Cheng, Jiaming, et al.
Published: (2025)

Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations
by: Zaiem, Salah, et al.
Published: (2024)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)

Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)

Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation
by: Lu, Cheng, et al.
Published: (2024)

Personalized Neural Speech Codec
by: Jang, Inseon, et al.
Published: (2024)

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024)

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
by: Moussa, Denise, et al.
Published: (2023)

Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
by: Nespoli, Francesco, et al.
Published: (2024)

HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
by: Dong, Zhongren, et al.
Published: (2024)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
by: Rajapakshe, Thejan, et al.
Published: (2024)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)

Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric
by: Ogg, Mattson, et al.
Published: (2025)

Enhanced Reverberation as Supervision for Unsupervised Speech Separation
by: Saijo, Kohei, et al.
Published: (2024)

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
by: Shakeel, Muhammad, et al.
Published: (2024)

Variational Autoencoder for Personalized Pathological Speech Enhancement
by: Hou, Mingchi, et al.
Published: (2025)

Neural Speech Tracking in a Virtual Acoustic Environment: Audio-Visual Benefit for Unscripted Continuous Speech
by: Daeglau, Mareike, et al.
Published: (2025)

Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
by: Huang, Ziling, et al.
Published: (2025)

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)