Saved in:
Bibliographic Details
Main Authors: Fan, Zhihao, Tang, Jialong, Chen, Wei, Wang, Siyuan, Wei, Zhongyu, Xi, Jun, Huang, Fei, Zhou, Jingren
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.09742
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913406891589632
author Fan, Zhihao
Tang, Jialong
Chen, Wei
Wang, Siyuan
Wei, Zhongyu
Xi, Jun
Huang, Fei
Zhou, Jingren
author_facet Fan, Zhihao
Tang, Jialong
Chen, Wei
Wang, Siyuan
Wei, Zhongyu
Xi, Jun
Huang, Fei
Zhou, Jingren
contents Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs' clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}.
format Preprint
id arxiv_https___arxiv_org_abs_2402_09742
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
Fan, Zhihao
Tang, Jialong
Chen, Wei
Wang, Siyuan
Wei, Zhongyu
Xi, Jun
Huang, Fei
Zhou, Jingren
Computation and Language
Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs' clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}.
title AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
topic Computation and Language
url https://arxiv.org/abs/2402.09742