Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ahbabi, Hamdan Al, Marti, Gautier, AlMarri, Saeed, Elfadel, Ibrahim
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning Computation and Language
Online-Zugang:	https://arxiv.org/abs/2502.19387
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866910846432575488
author	Ahbabi, Hamdan Al Marti, Gautier AlMarri, Saeed Elfadel, Ibrahim
author_facet	Ahbabi, Hamdan Al Marti, Gautier AlMarri, Saeed Elfadel, Ibrahim
contents	Self-supervised learning models for speech processing, such as wav2vec2, HuBERT, WavLM, and Whisper, generate embeddings that capture both linguistic and paralinguistic information, making it challenging to analyze tone independently of spoken content. In this work, we introduce a method for disentangling paralinguistic features from linguistic content by regressing speech embeddings onto their corresponding text embeddings and using the residuals as a representation of vocal tone. We evaluate this approach across multiple self-supervised speech embeddings, demonstrating that residual embeddings significantly improve tone classification performance compared to raw speech embeddings. Our results show that this method enhances linear separability, enabling improved classification even with simple models such as logistic regression. Visualization of the residual embeddings further confirms the successful removal of linguistic information while preserving tone-related features. These findings highlight the potential of residual embeddings for applications in sentiment analysis, speaker characterization, and paralinguistic speech processing.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_19387
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis Ahbabi, Hamdan Al Marti, Gautier AlMarri, Saeed Elfadel, Ibrahim Machine Learning Computation and Language Self-supervised learning models for speech processing, such as wav2vec2, HuBERT, WavLM, and Whisper, generate embeddings that capture both linguistic and paralinguistic information, making it challenging to analyze tone independently of spoken content. In this work, we introduce a method for disentangling paralinguistic features from linguistic content by regressing speech embeddings onto their corresponding text embeddings and using the residuals as a representation of vocal tone. We evaluate this approach across multiple self-supervised speech embeddings, demonstrating that residual embeddings significantly improve tone classification performance compared to raw speech embeddings. Our results show that this method enhances linear separability, enabling improved classification even with simple models such as logistic regression. Visualization of the residual embeddings further confirms the successful removal of linguistic information while preserving tone-related features. These findings highlight the potential of residual embeddings for applications in sentiment analysis, speaker characterization, and paralinguistic speech processing.
title	Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2502.19387

Ähnliche Einträge