Saved in:
Bibliographic Details
Main Authors: Kim, Sewon, Kim, Jiwon, Shin, Seungwoo, Chung, Hyejin, Moon, Daeun, Kwon, Yejin, Yoon, Hyunsoo
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.16921
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912839580516352
author Kim, Sewon
Kim, Jiwon
Shin, Seungwoo
Chung, Hyejin
Moon, Daeun
Kwon, Yejin
Yoon, Hyunsoo
author_facet Kim, Sewon
Kim, Jiwon
Shin, Seungwoo
Chung, Hyejin
Moon, Daeun
Kwon, Yejin
Yoon, Hyunsoo
contents Large Language Models (LLMs) are increasingly engaged in emotionally vulnerable conversations that extend beyond information seeking to moments of personal distress. As they adopt affective tones and simulate empathy, they risk creating the illusion of genuine relational connection. We term this phenomenon Affective Hallucination, referring to emotionally immersive responses that evoke false social presence despite the model's lack of affective capacity. To address this, we introduce AHaBench, a benchmark of 500 mental-health-related prompts with expert-informed reference responses, evaluated along three dimensions: Emotional Enmeshment, Illusion of Presence, and Fostering Overdependence. We further release AHaPairs, a 5K-instance preference dataset enabling Direct Preference Optimization (DPO) for alignment with emotionally responsible behavior. DPO fine-tuning substantially reduces affective hallucination without compromising reasoning performance, and the Pearson correlation coefficients between GPT-4o and human judgments is also strong (r=0.85) indicating that human evaluations confirm AHaBench as an effective diagnostic tool. This work establishes affective hallucination as a distinct safety concern and provides resources for developing LLMs that are both factually reliable and psychologically safe. AHaBench and AHaPairs are accessible via https://huggingface.co/datasets/o0oMiNGo0o/AHaBench, and code for fine-tuning and evaluation are in https://github.com/0oOMiNGOo0/AHaBench. Warning: This paper contains examples of mental health-related language that may be emotionally distressing.
format Preprint
id arxiv_https___arxiv_org_abs_2508_16921
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
Kim, Sewon
Kim, Jiwon
Shin, Seungwoo
Chung, Hyejin
Moon, Daeun
Kwon, Yejin
Yoon, Hyunsoo
Computation and Language
Large Language Models (LLMs) are increasingly engaged in emotionally vulnerable conversations that extend beyond information seeking to moments of personal distress. As they adopt affective tones and simulate empathy, they risk creating the illusion of genuine relational connection. We term this phenomenon Affective Hallucination, referring to emotionally immersive responses that evoke false social presence despite the model's lack of affective capacity. To address this, we introduce AHaBench, a benchmark of 500 mental-health-related prompts with expert-informed reference responses, evaluated along three dimensions: Emotional Enmeshment, Illusion of Presence, and Fostering Overdependence. We further release AHaPairs, a 5K-instance preference dataset enabling Direct Preference Optimization (DPO) for alignment with emotionally responsible behavior. DPO fine-tuning substantially reduces affective hallucination without compromising reasoning performance, and the Pearson correlation coefficients between GPT-4o and human judgments is also strong (r=0.85) indicating that human evaluations confirm AHaBench as an effective diagnostic tool. This work establishes affective hallucination as a distinct safety concern and provides resources for developing LLMs that are both factually reliable and psychologically safe. AHaBench and AHaPairs are accessible via https://huggingface.co/datasets/o0oMiNGo0o/AHaBench, and code for fine-tuning and evaluation are in https://github.com/0oOMiNGOo0/AHaBench. Warning: This paper contains examples of mental health-related language that may be emotionally distressing.
title Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
topic Computation and Language
url https://arxiv.org/abs/2508.16921