Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chi, Jie, de Seyssel, Maureen, Schluter, Natalie
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.05389
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913683900203008
author	Chi, Jie de Seyssel, Maureen Schluter, Natalie
author_facet	Chi, Jie de Seyssel, Maureen Schluter, Natalie
contents	Spoken language understanding research to date has generally carried a heavy text perspective. Most datasets are derived from text, which is then subsequently synthesized into speech, and most models typically rely on automatic transcriptions of speech. This is to the detriment of prosody--additional information carried by the speech signal beyond the phonetics of the words themselves and difficult to recover from text alone. In this work, we investigate the role of prosody in Spoken Question Answering. By isolating prosodic and lexical information on the SLUE-SQA-5 dataset, which consists of natural speech, we demonstrate that models trained on prosodic information alone can perform reasonably well by utilizing prosodic cues. However, we find that when lexical information is available, models tend to predominantly rely on it. Our findings suggest that while prosodic cues provide valuable supplementary information, more effective integration methods are required to ensure prosody contributes more significantly alongside lexical features.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_05389
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The Role of Prosody in Spoken Question Answering Chi, Jie de Seyssel, Maureen Schluter, Natalie Computation and Language Spoken language understanding research to date has generally carried a heavy text perspective. Most datasets are derived from text, which is then subsequently synthesized into speech, and most models typically rely on automatic transcriptions of speech. This is to the detriment of prosody--additional information carried by the speech signal beyond the phonetics of the words themselves and difficult to recover from text alone. In this work, we investigate the role of prosody in Spoken Question Answering. By isolating prosodic and lexical information on the SLUE-SQA-5 dataset, which consists of natural speech, we demonstrate that models trained on prosodic information alone can perform reasonably well by utilizing prosodic cues. However, we find that when lexical information is available, models tend to predominantly rely on it. Our findings suggest that while prosodic cues provide valuable supplementary information, more effective integration methods are required to ensure prosody contributes more significantly alongside lexical features.
title	The Role of Prosody in Spoken Question Answering
topic	Computation and Language
url	https://arxiv.org/abs/2502.05389

Similar Items