Salvato in:
Dettagli Bibliografici
Autori principali: Chen, Zhipeng, Min, Yingqian, Zhang, Beichen, Chen, Jie, Jiang, Jinhao, Cheng, Daixuan, Zhao, Wayne Xin, Liu, Zheng, Miao, Xu, Lu, Yang, Fang, Lei, Wang, Zhongyuan, Wen, Ji-Rong
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2503.04548
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866912262730547200
author Chen, Zhipeng
Min, Yingqian
Zhang, Beichen
Chen, Jie
Jiang, Jinhao
Cheng, Daixuan
Zhao, Wayne Xin
Liu, Zheng
Miao, Xu
Lu, Yang
Fang, Lei
Wang, Zhongyuan
Wen, Ji-Rong
author_facet Chen, Zhipeng
Min, Yingqian
Zhang, Beichen
Chen, Jie
Jiang, Jinhao
Cheng, Daixuan
Zhao, Wayne Xin
Liu, Zheng
Miao, Xu
Lu, Yang
Fang, Lei
Wang, Zhongyuan
Wen, Ji-Rong
contents In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.
format Preprint
id arxiv_https___arxiv_org_abs_2503_04548
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Chen, Zhipeng
Min, Yingqian
Zhang, Beichen
Chen, Jie
Jiang, Jinhao
Cheng, Daixuan
Zhao, Wayne Xin
Liu, Zheng
Miao, Xu
Lu, Yang
Fang, Lei
Wang, Zhongyuan
Wen, Ji-Rong
Computation and Language
In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.
title An Empirical Study on Eliciting and Improving R1-like Reasoning Models
topic Computation and Language
url https://arxiv.org/abs/2503.04548