Salvato in:
Dettagli Bibliografici
Autori principali: Wang, Tianrui, Ma, Ziyang, Peng, Yizhou, Wang, Haoyu, Niu, Zhikang, Huang, Zikang, Wu, Yihao, Chao, Yi-Wen, Jiang, Yu, Lu, Yuheng, Yang, Guanrou, Li, Xuanchen, Liu, Hexin, Qiang, Chunyu, Gong, Cheng, Yang, Yifan, Liu, Tianchi, Wang, Junyu, Hou, Nana, Ge, Meng, You, Fuming, Yang, Wei, Sun, Zhongqian, Hu, Haifeng, Wang, Xiaobao, Chng, Eng Siong, Chen, Xie, Wang, Longbiao, Dang, Jianwu
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2605.09413
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866915998787960832
author Wang, Tianrui
Ma, Ziyang
Peng, Yizhou
Wang, Haoyu
Niu, Zhikang
Huang, Zikang
Wu, Yihao
Chao, Yi-Wen
Jiang, Yu
Lu, Yuheng
Yang, Guanrou
Li, Xuanchen
Liu, Hexin
Qiang, Chunyu
Gong, Cheng
Yang, Yifan
Liu, Tianchi
Wang, Junyu
Hou, Nana
Ge, Meng
You, Fuming
Yang, Wei
Sun, Zhongqian
Hu, Haifeng
Wang, Xiaobao
Chng, Eng Siong
Chen, Xie
Wang, Longbiao
Dang, Jianwu
author_facet Wang, Tianrui
Ma, Ziyang
Peng, Yizhou
Wang, Haoyu
Niu, Zhikang
Huang, Zikang
Wu, Yihao
Chao, Yi-Wen
Jiang, Yu
Lu, Yuheng
Yang, Guanrou
Li, Xuanchen
Liu, Hexin
Qiang, Chunyu
Gong, Cheng
Yang, Yifan
Liu, Tianchi
Wang, Junyu
Hou, Nana
Ge, Meng
You, Fuming
Yang, Wei
Sun, Zhongqian
Hu, Haifeng
Wang, Xiaobao
Chng, Eng Siong
Chen, Xie
Wang, Longbiao
Dang, Jianwu
contents Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems.
format Preprint
id arxiv_https___arxiv_org_abs_2605_09413
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Evaluating the Expressive Appropriateness of Speech in Rich Contexts
Wang, Tianrui
Ma, Ziyang
Peng, Yizhou
Wang, Haoyu
Niu, Zhikang
Huang, Zikang
Wu, Yihao
Chao, Yi-Wen
Jiang, Yu
Lu, Yuheng
Yang, Guanrou
Li, Xuanchen
Liu, Hexin
Qiang, Chunyu
Gong, Cheng
Yang, Yifan
Liu, Tianchi
Wang, Junyu
Hou, Nana
Ge, Meng
You, Fuming
Yang, Wei
Sun, Zhongqian
Hu, Haifeng
Wang, Xiaobao
Chng, Eng Siong
Chen, Xie
Wang, Longbiao
Dang, Jianwu
Audio and Speech Processing
Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems.
title Evaluating the Expressive Appropriateness of Speech in Rich Contexts
topic Audio and Speech Processing
url https://arxiv.org/abs/2605.09413