Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fan, Zhihao, Tang, Jialong, Chen, Wei, Wang, Siyuan, Wei, Zhongyu, Xi, Jun, Huang, Fei, Zhou, Jingren
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.09742
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913406891589632
author	Fan, Zhihao Tang, Jialong Chen, Wei Wang, Siyuan Wei, Zhongyu Xi, Jun Huang, Fei Zhou, Jingren
author_facet	Fan, Zhihao Tang, Jialong Chen, Wei Wang, Siyuan Wei, Zhongyu Xi, Jun Huang, Fei Zhou, Jingren
contents	Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs' clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_09742
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator Fan, Zhihao Tang, Jialong Chen, Wei Wang, Siyuan Wei, Zhongyu Xi, Jun Huang, Fei Zhou, Jingren Computation and Language Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs' clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}.
title	AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
topic	Computation and Language
url	https://arxiv.org/abs/2402.09742

Similar Items