Saved in:
Bibliographic Details
Main Authors: Wang, Yunxiao, Liu, Meng, Jiang, Kaiyu, Wen, Bin, Yang, Fan, Gao, Tingting, Liao, Lizi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.09521
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908948482752512
author Wang, Yunxiao
Liu, Meng
Jiang, Kaiyu
Wen, Bin
Yang, Fan
Gao, Tingting
Liao, Lizi
author_facet Wang, Yunxiao
Liu, Meng
Jiang, Kaiyu
Wen, Bin
Yang, Fan
Gao, Tingting
Liao, Lizi
contents Emotional support conversations require more than fluent responses. Supporters need to understand the seeker's situation and emotions, adopt an appropriate strategy, and respond in a natural, human-like manner. Despite advances in large language models, current systems often lack structured, psychology-informed reasoning. Additionally, it is challenging to enhance these systems through reinforcement learning because of unreliable reward signals. Moreover, reinforcement fine-tuning can amplify repetitive response patterns. We propose structured empathetic reasoning, which breaks support into three steps: conversation history analysis, multimodal emotional state inference, and strategy selection, prior to generating the final reply. To implement this, we introduce SER, a fine-grained dataset with step-level correctness labels and pairwise response preferences. We then present PEER, which uses GRPO with UnifiReward, a unified process-outcome reward model for evaluating both reasoning steps and final responses in multi-turn interactions. To reduce repetition, we enhance data with personality-based rewriting and down-weight redundant outputs. Comprehensive experiments show improved empathy, strategy alignment, and human-likeness without sacrificing diversity.
format Preprint
id arxiv_https___arxiv_org_abs_2508_09521
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
Wang, Yunxiao
Liu, Meng
Jiang, Kaiyu
Wen, Bin
Yang, Fan
Gao, Tingting
Liao, Lizi
Computation and Language
Artificial Intelligence
Emotional support conversations require more than fluent responses. Supporters need to understand the seeker's situation and emotions, adopt an appropriate strategy, and respond in a natural, human-like manner. Despite advances in large language models, current systems often lack structured, psychology-informed reasoning. Additionally, it is challenging to enhance these systems through reinforcement learning because of unreliable reward signals. Moreover, reinforcement fine-tuning can amplify repetitive response patterns. We propose structured empathetic reasoning, which breaks support into three steps: conversation history analysis, multimodal emotional state inference, and strategy selection, prior to generating the final reply. To implement this, we introduce SER, a fine-grained dataset with step-level correctness labels and pairwise response preferences. We then present PEER, which uses GRPO with UnifiReward, a unified process-outcome reward model for evaluating both reasoning steps and final responses in multi-turn interactions. To reduce repetition, we enhance data with personality-based rewriting and down-weight redundant outputs. Comprehensive experiments show improved empathy, strategy alignment, and human-likeness without sacrificing diversity.
title PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2508.09521