Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chandra, Mohit, Sriraman, Siddharth, Khanuja, Harneet Singh, Jin, Yiqiao, De Choudhury, Munmun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.20201
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910972521742336
author	Chandra, Mohit Sriraman, Siddharth Khanuja, Harneet Singh Jin, Yiqiao De Choudhury, Munmun
author_facet	Chandra, Mohit Sriraman, Siddharth Khanuja, Harneet Singh Jin, Yiqiao De Choudhury, Munmun
contents	Limited access to mental healthcare, extended wait times, and increasing capabilities of Large Language Models (LLMs) has led individuals to turn to LLMs for fulfilling their mental health needs. However, examining the multi-turn mental health conversation capabilities of LLMs remains under-explored. Existing evaluation frameworks typically focus on diagnostic accuracy and win-rates and often overlook alignment with patient-specific goals, values, and personalities required for meaningful conversations. To address this, we introduce MedAgent, a novel framework for synthetically generating realistic, multi-turn mental health sensemaking conversations and use it to create the Mental Health Sensemaking Dialogue (MHSD) dataset, comprising over 2,200 patient-LLM conversations. Additionally, we present MultiSenseEval, a holistic framework to evaluate the multi-turn conversation abilities of LLMs in healthcare settings using human-centric criteria. Our findings reveal that frontier reasoning models yield below-par performance for patient-centric communication and struggle at advanced diagnostic capabilities with average score of 31%. Additionally, we observed variation in model performance based on patient's persona and performance drop with increasing turns in the conversation. Our work provides a comprehensive synthetic data generation framework, a dataset and evaluation framework for assessing LLMs in multi-turn mental health conversations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_20201
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations Chandra, Mohit Sriraman, Siddharth Khanuja, Harneet Singh Jin, Yiqiao De Choudhury, Munmun Computation and Language Limited access to mental healthcare, extended wait times, and increasing capabilities of Large Language Models (LLMs) has led individuals to turn to LLMs for fulfilling their mental health needs. However, examining the multi-turn mental health conversation capabilities of LLMs remains under-explored. Existing evaluation frameworks typically focus on diagnostic accuracy and win-rates and often overlook alignment with patient-specific goals, values, and personalities required for meaningful conversations. To address this, we introduce MedAgent, a novel framework for synthetically generating realistic, multi-turn mental health sensemaking conversations and use it to create the Mental Health Sensemaking Dialogue (MHSD) dataset, comprising over 2,200 patient-LLM conversations. Additionally, we present MultiSenseEval, a holistic framework to evaluate the multi-turn conversation abilities of LLMs in healthcare settings using human-centric criteria. Our findings reveal that frontier reasoning models yield below-par performance for patient-centric communication and struggle at advanced diagnostic capabilities with average score of 31%. Additionally, we observed variation in model performance based on patient's persona and performance drop with increasing turns in the conversation. Our work provides a comprehensive synthetic data generation framework, a dataset and evaluation framework for assessing LLMs in multi-turn mental health conversations.
title	Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations
topic	Computation and Language
url	https://arxiv.org/abs/2505.20201

Similar Items