MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wang, Tianrui, Ma, Ziyang, Peng, Yizhou, Wang, Haoyu, Niu, Zhikang, Huang, Zikang, Wu, Yihao, Chao, Yi-Wen, Jiang, Yu, Lu, Yuheng, Yang, Guanrou, Li, Xuanchen, Liu, Hexin, Qiang, Chunyu, Gong, Cheng, Yang, Yifan, Liu, Tianchi, Wang, Junyu, Hou, Nana, Ge, Meng, You, Fuming, Yang, Wei, Sun, Zhongqian, Hu, Haifeng, Wang, Xiaobao, Chng, Eng Siong, Chen, Xie, Wang, Longbiao, Dang, Jianwu
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Audio and Speech Processing
Accesso online:	https://arxiv.org/abs/2605.09413
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866915998787960832
author	Wang, Tianrui Ma, Ziyang Peng, Yizhou Wang, Haoyu Niu, Zhikang Huang, Zikang Wu, Yihao Chao, Yi-Wen Jiang, Yu Lu, Yuheng Yang, Guanrou Li, Xuanchen Liu, Hexin Qiang, Chunyu Gong, Cheng Yang, Yifan Liu, Tianchi Wang, Junyu Hou, Nana Ge, Meng You, Fuming Yang, Wei Sun, Zhongqian Hu, Haifeng Wang, Xiaobao Chng, Eng Siong Chen, Xie Wang, Longbiao Dang, Jianwu
author_facet	Wang, Tianrui Ma, Ziyang Peng, Yizhou Wang, Haoyu Niu, Zhikang Huang, Zikang Wu, Yihao Chao, Yi-Wen Jiang, Yu Lu, Yuheng Yang, Guanrou Li, Xuanchen Liu, Hexin Qiang, Chunyu Gong, Cheng Yang, Yifan Liu, Tianchi Wang, Junyu Hou, Nana Ge, Meng You, Fuming Yang, Wei Sun, Zhongqian Hu, Haifeng Wang, Xiaobao Chng, Eng Siong Chen, Xie Wang, Longbiao Dang, Jianwu
contents	Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_09413
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Evaluating the Expressive Appropriateness of Speech in Rich Contexts Wang, Tianrui Ma, Ziyang Peng, Yizhou Wang, Haoyu Niu, Zhikang Huang, Zikang Wu, Yihao Chao, Yi-Wen Jiang, Yu Lu, Yuheng Yang, Guanrou Li, Xuanchen Liu, Hexin Qiang, Chunyu Gong, Cheng Yang, Yifan Liu, Tianchi Wang, Junyu Hou, Nana Ge, Meng You, Fuming Yang, Wei Sun, Zhongqian Hu, Haifeng Wang, Xiaobao Chng, Eng Siong Chen, Xie Wang, Longbiao Dang, Jianwu Audio and Speech Processing Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems.
title	Evaluating the Expressive Appropriateness of Speech in Rich Contexts
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2605.09413

Documenti analoghi