Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.07158 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912320326729728 |
|---|---|
| author | Ling Team Tang, Caizhi Fu, Chilin Wu, Chunwei Guo, Jia Wang, Jianwen Hu, Jingyu Jiang, Liang Li, Meng Jiao, Peng Liu, Pingping Zheng, Shaomian Liang, Shiwei Li, Shuaicheng Zhang, Yalin Wu, Yingting Liu, Yongkang Huang, Zhenyu |
| author_facet | Ling Team Tang, Caizhi Fu, Chilin Wu, Chunwei Guo, Jia Wang, Jianwen Hu, Jingyu Jiang, Liang Li, Meng Jiao, Peng Liu, Pingping Zheng, Shaomian Liang, Shiwei Li, Shuaicheng Zhang, Yalin Wu, Yingting Liu, Yongkang Huang, Zhenyu |
| contents | This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaining its parameter-efficient architecture with only 2.75 billion activated parameters, establishing an efficient lightweight reasoning architecture. In particular, in constructing this model, we have not merely focused on enhancing advanced reasoning capabilities, exemplified by high-difficulty mathematical problem solving, but rather aimed to develop a reasoning model with more comprehensive competency coverage. Our approach ensures coverage across reasoning tasks of varying difficulty levels while preserving generic capabilities, such as instruction following, tool use, and knowledge retention. We show that, Ring-Lite-Distill's reasoning ability reaches a level comparable to DeepSeek-R1-Distill-Qwen-7B, while its general capabilities significantly surpass those of DeepSeek-R1-Distill-Qwen-7B. The models are accessible at https://huggingface.co/inclusionAI |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_07158 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models Ling Team Tang, Caizhi Fu, Chilin Wu, Chunwei Guo, Jia Wang, Jianwen Hu, Jingyu Jiang, Liang Li, Meng Jiao, Peng Liu, Pingping Zheng, Shaomian Liang, Shiwei Li, Shuaicheng Zhang, Yalin Wu, Yingting Liu, Yongkang Huang, Zhenyu Machine Learning Computation and Language This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaining its parameter-efficient architecture with only 2.75 billion activated parameters, establishing an efficient lightweight reasoning architecture. In particular, in constructing this model, we have not merely focused on enhancing advanced reasoning capabilities, exemplified by high-difficulty mathematical problem solving, but rather aimed to develop a reasoning model with more comprehensive competency coverage. Our approach ensures coverage across reasoning tasks of varying difficulty levels while preserving generic capabilities, such as instruction following, tool use, and knowledge retention. We show that, Ring-Lite-Distill's reasoning ability reaches a level comparable to DeepSeek-R1-Distill-Qwen-7B, while its general capabilities significantly surpass those of DeepSeek-R1-Distill-Qwen-7B. The models are accessible at https://huggingface.co/inclusionAI |
| title | Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models |
| topic | Machine Learning Computation and Language |
| url | https://arxiv.org/abs/2504.07158 |