Saved in:
Bibliographic Details
Main Authors: Biron, Tirza, Barboy, Moshe, Ben-Artzy, Eran, Golubchik, Alona, Marmor, Yanir, Szekely, Smadar, Winter, Yaron, Harel, David
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.03522
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909135370452992
author Biron, Tirza
Barboy, Moshe
Ben-Artzy, Eran
Golubchik, Alona
Marmor, Yanir
Szekely, Smadar
Winter, Yaron
Harel, David
author_facet Biron, Tirza
Barboy, Moshe
Ben-Artzy, Eran
Golubchik, Alona
Marmor, Yanir
Szekely, Smadar
Winter, Yaron
Harel, David
contents Non-verbal signals in speech are encoded by prosody and carry information that ranges from conversation action to attitude and emotion. Despite its importance, the principles that govern prosodic structure are not yet adequately understood. This paper offers an analytical schema and a technological proof-of-concept for the categorization of prosodic signals and their association with meaning. The schema interprets surface-representations of multi-layered prosodic events. As a first step towards implementation, we present a classification process that disentangles prosodic phenomena of three orders. It relies on fine-tuning a pre-trained speech recognition model, enabling the simultaneous multi-class/multi-label detection. It generalizes over a large variety of spontaneous data, performing on a par with, or superior to, human annotation. In addition to a standardized formalization of prosody, disentangling prosodic patterns can direct a theory of communication and speech organization. A welcome by-product is an interpretation of prosody that will enhance speech- and language-related technologies.
format Preprint
id arxiv_https___arxiv_org_abs_2403_03522
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Non-verbal information in spontaneous speech -- towards a new framework of analysis
Biron, Tirza
Barboy, Moshe
Ben-Artzy, Eran
Golubchik, Alona
Marmor, Yanir
Szekely, Smadar
Winter, Yaron
Harel, David
Sound
Computation and Language
Machine Learning
Audio and Speech Processing
Non-verbal signals in speech are encoded by prosody and carry information that ranges from conversation action to attitude and emotion. Despite its importance, the principles that govern prosodic structure are not yet adequately understood. This paper offers an analytical schema and a technological proof-of-concept for the categorization of prosodic signals and their association with meaning. The schema interprets surface-representations of multi-layered prosodic events. As a first step towards implementation, we present a classification process that disentangles prosodic phenomena of three orders. It relies on fine-tuning a pre-trained speech recognition model, enabling the simultaneous multi-class/multi-label detection. It generalizes over a large variety of spontaneous data, performing on a par with, or superior to, human annotation. In addition to a standardized formalization of prosody, disentangling prosodic patterns can direct a theory of communication and speech organization. A welcome by-product is an interpretation of prosody that will enhance speech- and language-related technologies.
title Non-verbal information in spontaneous speech -- towards a new framework of analysis
topic Sound
Computation and Language
Machine Learning
Audio and Speech Processing
url https://arxiv.org/abs/2403.03522