Saved in:
Bibliographic Details
Main Authors: Luo, Xuan, Yao, Lewei, Zhao, Libo, Hong, Lanqing, Chen, Kai, Tao, Dehua, Tan, Daxin, Xu, Ruifeng, Li, Jing
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.10513
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917205653848064
author Luo, Xuan
Yao, Lewei
Zhao, Libo
Hong, Lanqing
Chen, Kai
Tao, Dehua
Tan, Daxin
Xu, Ruifeng
Li, Jing
author_facet Luo, Xuan
Yao, Lewei
Zhao, Libo
Hong, Lanqing
Chen, Kai
Tao, Dehua
Tan, Daxin
Xu, Ruifeng
Li, Jing
contents While the automatic evaluation of omni-modal large models (OLMs) is essential, assessing empathy remains a significant challenge due to its inherent affectivity. To investigate this challenge, we introduce AEQ-Bench (Audio Empathy Quotient Benchmark), a novel benchmark to systematically assess two core empathetic capabilities of OLMs: (i) generating empathetic responses by comprehending affective cues from multi-modal inputs (audio + text), and (ii) judging the empathy of audio responses without relying on text transcription. Compared to existing benchmarks, AEQ-Bench incorporates two novel settings that vary in context specificity and speech tone. Comprehensive assessment across linguistic and paralinguistic metrics reveals that (1) OLMs trained with audio output capabilities generally outperformed models with text-only outputs, and (2) while OLMs align with human judgments for coarse-grained quality assessment, they remain unreliable for evaluating fine-grained paralinguistic expressiveness.
format Preprint
id arxiv_https___arxiv_org_abs_2601_10513
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle AEQ-Bench: Measuring Empathy of Omni-Modal Large Models
Luo, Xuan
Yao, Lewei
Zhao, Libo
Hong, Lanqing
Chen, Kai
Tao, Dehua
Tan, Daxin
Xu, Ruifeng
Li, Jing
Computation and Language
Human-Computer Interaction
While the automatic evaluation of omni-modal large models (OLMs) is essential, assessing empathy remains a significant challenge due to its inherent affectivity. To investigate this challenge, we introduce AEQ-Bench (Audio Empathy Quotient Benchmark), a novel benchmark to systematically assess two core empathetic capabilities of OLMs: (i) generating empathetic responses by comprehending affective cues from multi-modal inputs (audio + text), and (ii) judging the empathy of audio responses without relying on text transcription. Compared to existing benchmarks, AEQ-Bench incorporates two novel settings that vary in context specificity and speech tone. Comprehensive assessment across linguistic and paralinguistic metrics reveals that (1) OLMs trained with audio output capabilities generally outperformed models with text-only outputs, and (2) while OLMs align with human judgments for coarse-grained quality assessment, they remain unreliable for evaluating fine-grained paralinguistic expressiveness.
title AEQ-Bench: Measuring Empathy of Omni-Modal Large Models
topic Computation and Language
Human-Computer Interaction
url https://arxiv.org/abs/2601.10513