Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Yunxiao, Liu, Meng, Jiang, Kaiyu, Wen, Bin, Yang, Fan, Gao, Tingting, Liao, Lizi
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.09521
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908948482752512
author	Wang, Yunxiao Liu, Meng Jiang, Kaiyu Wen, Bin Yang, Fan Gao, Tingting Liao, Lizi
author_facet	Wang, Yunxiao Liu, Meng Jiang, Kaiyu Wen, Bin Yang, Fan Gao, Tingting Liao, Lizi
contents	Emotional support conversations require more than fluent responses. Supporters need to understand the seeker's situation and emotions, adopt an appropriate strategy, and respond in a natural, human-like manner. Despite advances in large language models, current systems often lack structured, psychology-informed reasoning. Additionally, it is challenging to enhance these systems through reinforcement learning because of unreliable reward signals. Moreover, reinforcement fine-tuning can amplify repetitive response patterns. We propose structured empathetic reasoning, which breaks support into three steps: conversation history analysis, multimodal emotional state inference, and strategy selection, prior to generating the final reply. To implement this, we introduce SER, a fine-grained dataset with step-level correctness labels and pairwise response preferences. We then present PEER, which uses GRPO with UnifiReward, a unified process-outcome reward model for evaluating both reasoning steps and final responses in multi-turn interactions. To reduce repetition, we enhance data with personality-based rewriting and down-weight redundant outputs. Comprehensive experiments show improved empathy, strategy alignment, and human-likeness without sacrificing diversity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_09521
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning Wang, Yunxiao Liu, Meng Jiang, Kaiyu Wen, Bin Yang, Fan Gao, Tingting Liao, Lizi Computation and Language Artificial Intelligence Emotional support conversations require more than fluent responses. Supporters need to understand the seeker's situation and emotions, adopt an appropriate strategy, and respond in a natural, human-like manner. Despite advances in large language models, current systems often lack structured, psychology-informed reasoning. Additionally, it is challenging to enhance these systems through reinforcement learning because of unreliable reward signals. Moreover, reinforcement fine-tuning can amplify repetitive response patterns. We propose structured empathetic reasoning, which breaks support into three steps: conversation history analysis, multimodal emotional state inference, and strategy selection, prior to generating the final reply. To implement this, we introduce SER, a fine-grained dataset with step-level correctness labels and pairwise response preferences. We then present PEER, which uses GRPO with UnifiReward, a unified process-outcome reward model for evaluating both reasoning steps and final responses in multi-turn interactions. To reduce repetition, we enhance data with personality-based rewriting and down-weight redundant outputs. Comprehensive experiments show improved empathy, strategy alignment, and human-likeness without sacrificing diversity.
title	PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2508.09521

Similar Items