Salvato in:
| Autori principali: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2026
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2605.09413 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866915998787960832 |
|---|---|
| author | Wang, Tianrui Ma, Ziyang Peng, Yizhou Wang, Haoyu Niu, Zhikang Huang, Zikang Wu, Yihao Chao, Yi-Wen Jiang, Yu Lu, Yuheng Yang, Guanrou Li, Xuanchen Liu, Hexin Qiang, Chunyu Gong, Cheng Yang, Yifan Liu, Tianchi Wang, Junyu Hou, Nana Ge, Meng You, Fuming Yang, Wei Sun, Zhongqian Hu, Haifeng Wang, Xiaobao Chng, Eng Siong Chen, Xie Wang, Longbiao Dang, Jianwu |
| author_facet | Wang, Tianrui Ma, Ziyang Peng, Yizhou Wang, Haoyu Niu, Zhikang Huang, Zikang Wu, Yihao Chao, Yi-Wen Jiang, Yu Lu, Yuheng Yang, Guanrou Li, Xuanchen Liu, Hexin Qiang, Chunyu Gong, Cheng Yang, Yifan Liu, Tianchi Wang, Junyu Hou, Nana Ge, Meng You, Fuming Yang, Wei Sun, Zhongqian Hu, Haifeng Wang, Xiaobao Chng, Eng Siong Chen, Xie Wang, Longbiao Dang, Jianwu |
| contents | Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_09413 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Evaluating the Expressive Appropriateness of Speech in Rich Contexts Wang, Tianrui Ma, Ziyang Peng, Yizhou Wang, Haoyu Niu, Zhikang Huang, Zikang Wu, Yihao Chao, Yi-Wen Jiang, Yu Lu, Yuheng Yang, Guanrou Li, Xuanchen Liu, Hexin Qiang, Chunyu Gong, Cheng Yang, Yifan Liu, Tianchi Wang, Junyu Hou, Nana Ge, Meng You, Fuming Yang, Wei Sun, Zhongqian Hu, Haifeng Wang, Xiaobao Chng, Eng Siong Chen, Xie Wang, Longbiao Dang, Jianwu Audio and Speech Processing Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems. |
| title | Evaluating the Expressive Appropriateness of Speech in Rich Contexts |
| topic | Audio and Speech Processing |
| url | https://arxiv.org/abs/2605.09413 |