Saved in:
Bibliographic Details
Main Authors: Gu, Boyang, Zhou, Hongjian, Segal, Bradley Max, Wu, Jinge, Cao, Zeyu, Zhong, Hantao, Clifton, Lei, Liu, Fenglin, Clifton, David A.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.00601
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911298834399232
author Gu, Boyang
Zhou, Hongjian
Segal, Bradley Max
Wu, Jinge
Cao, Zeyu
Zhong, Hantao
Clifton, Lei
Liu, Fenglin
Clifton, David A.
author_facet Gu, Boyang
Zhou, Hongjian
Segal, Bradley Max
Wu, Jinge
Cao, Zeyu
Zhong, Hantao
Clifton, Lei
Liu, Fenglin
Clifton, David A.
contents Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in post-training scaling of LLMs for medical domains.
format Preprint
id arxiv_https___arxiv_org_abs_2512_00601
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization
Gu, Boyang
Zhou, Hongjian
Segal, Bradley Max
Wu, Jinge
Cao, Zeyu
Zhong, Hantao
Clifton, Lei
Liu, Fenglin
Clifton, David A.
Artificial Intelligence
Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in post-training scaling of LLMs for medical domains.
title Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization
topic Artificial Intelligence
url https://arxiv.org/abs/2512.00601