Saved in:
Bibliographic Details
Main Authors: Li, Haolin, Jiang, Shuyang, Zhang, Ruipeng, Yao, Jiangchao, Zhang, Ya, Wang, Yanfeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.11547
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915934788124672
author Li, Haolin
Jiang, Shuyang
Zhang, Ruipeng
Yao, Jiangchao
Zhang, Ya
Wang, Yanfeng
author_facet Li, Haolin
Jiang, Shuyang
Zhang, Ruipeng
Yao, Jiangchao
Zhang, Ya
Wang, Yanfeng
contents While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tuning, then conduct reinforcement learning (RL). These methods exhibit limited improvement on underrepresented domains like rare diseases while incurring substantial costs from generating complex reasoning chains. To efficiently enhance medical reasoning, we propose MedSSR, a Medical Knowledge-enhanced data Synthesis and Semi-supervised Reinforcement learning framework. Our framework first employs rare disease knowledge to synthesize distribution-controllable reasoning questions. We then utilize the policy model itself to generate high-quality pseudo-labels. This enables a two-stage, intrinsic-to-extrinsic training paradigm: self-supervised RL on the pseudo-labeled synthetic data, followed by supervised RL on the human-annotated real data. MedSSR scales model training efficiently without relying on costly trace distillation. Extensive experiments on Qwen and Llama demonstrate that our method outperforms existing methods across ten medical benchmarks, achieving up to +5.93% gain on rare-disease tasks. Our code is available at https://github.com/tdlhl/MedSSR.
format Preprint
id arxiv_https___arxiv_org_abs_2604_11547
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach
Li, Haolin
Jiang, Shuyang
Zhang, Ruipeng
Yao, Jiangchao
Zhang, Ya
Wang, Yanfeng
Machine Learning
Computation and Language
While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tuning, then conduct reinforcement learning (RL). These methods exhibit limited improvement on underrepresented domains like rare diseases while incurring substantial costs from generating complex reasoning chains. To efficiently enhance medical reasoning, we propose MedSSR, a Medical Knowledge-enhanced data Synthesis and Semi-supervised Reinforcement learning framework. Our framework first employs rare disease knowledge to synthesize distribution-controllable reasoning questions. We then utilize the policy model itself to generate high-quality pseudo-labels. This enables a two-stage, intrinsic-to-extrinsic training paradigm: self-supervised RL on the pseudo-labeled synthetic data, followed by supervised RL on the human-annotated real data. MedSSR scales model training efficiently without relying on costly trace distillation. Extensive experiments on Qwen and Llama demonstrate that our method outperforms existing methods across ten medical benchmarks, achieving up to +5.93% gain on rare-disease tasks. Our code is available at https://github.com/tdlhl/MedSSR.
title Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach
topic Machine Learning
Computation and Language
url https://arxiv.org/abs/2604.11547