Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Sewon, Kim, Jiwon, Shin, Seungwoo, Chung, Hyejin, Moon, Daeun, Kwon, Yejin, Yoon, Hyunsoo
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2508.16921
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912839580516352
author	Kim, Sewon Kim, Jiwon Shin, Seungwoo Chung, Hyejin Moon, Daeun Kwon, Yejin Yoon, Hyunsoo
author_facet	Kim, Sewon Kim, Jiwon Shin, Seungwoo Chung, Hyejin Moon, Daeun Kwon, Yejin Yoon, Hyunsoo
contents	Large Language Models (LLMs) are increasingly engaged in emotionally vulnerable conversations that extend beyond information seeking to moments of personal distress. As they adopt affective tones and simulate empathy, they risk creating the illusion of genuine relational connection. We term this phenomenon Affective Hallucination, referring to emotionally immersive responses that evoke false social presence despite the model's lack of affective capacity. To address this, we introduce AHaBench, a benchmark of 500 mental-health-related prompts with expert-informed reference responses, evaluated along three dimensions: Emotional Enmeshment, Illusion of Presence, and Fostering Overdependence. We further release AHaPairs, a 5K-instance preference dataset enabling Direct Preference Optimization (DPO) for alignment with emotionally responsible behavior. DPO fine-tuning substantially reduces affective hallucination without compromising reasoning performance, and the Pearson correlation coefficients between GPT-4o and human judgments is also strong (r=0.85) indicating that human evaluations confirm AHaBench as an effective diagnostic tool. This work establishes affective hallucination as a distinct safety concern and provides resources for developing LLMs that are both factually reliable and psychologically safe. AHaBench and AHaPairs are accessible via https://huggingface.co/datasets/o0oMiNGo0o/AHaBench, and code for fine-tuning and evaluation are in https://github.com/0oOMiNGOo0/AHaBench. Warning: This paper contains examples of mental health-related language that may be emotionally distressing.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_16921
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs Kim, Sewon Kim, Jiwon Shin, Seungwoo Chung, Hyejin Moon, Daeun Kwon, Yejin Yoon, Hyunsoo Computation and Language Large Language Models (LLMs) are increasingly engaged in emotionally vulnerable conversations that extend beyond information seeking to moments of personal distress. As they adopt affective tones and simulate empathy, they risk creating the illusion of genuine relational connection. We term this phenomenon Affective Hallucination, referring to emotionally immersive responses that evoke false social presence despite the model's lack of affective capacity. To address this, we introduce AHaBench, a benchmark of 500 mental-health-related prompts with expert-informed reference responses, evaluated along three dimensions: Emotional Enmeshment, Illusion of Presence, and Fostering Overdependence. We further release AHaPairs, a 5K-instance preference dataset enabling Direct Preference Optimization (DPO) for alignment with emotionally responsible behavior. DPO fine-tuning substantially reduces affective hallucination without compromising reasoning performance, and the Pearson correlation coefficients between GPT-4o and human judgments is also strong (r=0.85) indicating that human evaluations confirm AHaBench as an effective diagnostic tool. This work establishes affective hallucination as a distinct safety concern and provides resources for developing LLMs that are both factually reliable and psychologically safe. AHaBench and AHaPairs are accessible via https://huggingface.co/datasets/o0oMiNGo0o/AHaBench, and code for fine-tuning and evaluation are in https://github.com/0oOMiNGOo0/AHaBench. Warning: This paper contains examples of mental health-related language that may be emotionally distressing.
title	Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
topic	Computation and Language
url	https://arxiv.org/abs/2508.16921

Similar Items