Salvato in:
| Autori principali: | , , , , , , , , , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2025
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2503.04548 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866912262730547200 |
|---|---|
| author | Chen, Zhipeng Min, Yingqian Zhang, Beichen Chen, Jie Jiang, Jinhao Cheng, Daixuan Zhao, Wayne Xin Liu, Zheng Miao, Xu Lu, Yang Fang, Lei Wang, Zhongyuan Wen, Ji-Rong |
| author_facet | Chen, Zhipeng Min, Yingqian Zhang, Beichen Chen, Jie Jiang, Jinhao Cheng, Daixuan Zhao, Wayne Xin Liu, Zheng Miao, Xu Lu, Yang Fang, Lei Wang, Zhongyuan Wen, Ji-Rong |
| contents | In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_04548 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | An Empirical Study on Eliciting and Improving R1-like Reasoning Models Chen, Zhipeng Min, Yingqian Zhang, Beichen Chen, Jie Jiang, Jinhao Cheng, Daixuan Zhao, Wayne Xin Liu, Zheng Miao, Xu Lu, Yang Fang, Lei Wang, Zhongyuan Wen, Ji-Rong Computation and Language In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs. |
| title | An Empirical Study on Eliciting and Improving R1-like Reasoning Models |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2503.04548 |