Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.10039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912157632823296 |
|---|---|
| author | Hou, Ruihui Chen, Shencheng Fan, Yongqi Yu, Guangya Zhu, Lifeng Sun, Jing Liu, Jingping Ruan, Tong |
| author_facet | Hou, Ruihui Chen, Shencheng Fan, Yongqi Yu, Guangya Zhu, Lifeng Sun, Jing Liu, Jingping Ruan, Tong |
| contents | Clinical diagnosis is critical in medical practice, typically requiring a continuous and evolving process that includes primary diagnosis, differential diagnosis, and final diagnosis. However, most existing clinical diagnostic tasks are single-step processes, which does not align with the complex multi-step diagnostic procedures found in real-world clinical settings. In this paper, we propose a Chinese clinical diagnostic benchmark, called MSDiagnosis. This benchmark consists of 2,225 cases from 12 departments, covering tasks such as primary diagnosis, differential diagnosis, and final diagnosis. Additionally, we propose a novel and effective framework. This framework combines forward inference, backward inference, reflection, and refinement, enabling the large language model to self-evaluate and adjust its diagnostic results. To this end, we test open-source models, closed-source models, and our proposed framework.The experimental results demonstrate the effectiveness of the proposed method. We also provide a comprehensive experimental analysis and suggest future research directions for this task. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2408_10039 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | MSDiagnosis: A Benchmark for Evaluating Large Language Models in Multi-Step Clinical Diagnosis Hou, Ruihui Chen, Shencheng Fan, Yongqi Yu, Guangya Zhu, Lifeng Sun, Jing Liu, Jingping Ruan, Tong Artificial Intelligence Clinical diagnosis is critical in medical practice, typically requiring a continuous and evolving process that includes primary diagnosis, differential diagnosis, and final diagnosis. However, most existing clinical diagnostic tasks are single-step processes, which does not align with the complex multi-step diagnostic procedures found in real-world clinical settings. In this paper, we propose a Chinese clinical diagnostic benchmark, called MSDiagnosis. This benchmark consists of 2,225 cases from 12 departments, covering tasks such as primary diagnosis, differential diagnosis, and final diagnosis. Additionally, we propose a novel and effective framework. This framework combines forward inference, backward inference, reflection, and refinement, enabling the large language model to self-evaluate and adjust its diagnostic results. To this end, we test open-source models, closed-source models, and our proposed framework.The experimental results demonstrate the effectiveness of the proposed method. We also provide a comprehensive experimental analysis and suggest future research directions for this task. |
| title | MSDiagnosis: A Benchmark for Evaluating Large Language Models in Multi-Step Clinical Diagnosis |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2408.10039 |