Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.01562 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911386221674496 |
|---|---|
| author | Xu, Mingyu Fang, Cheng Jiang, Keyue Zheng, Yuqian Xiao, Yanghua Zhou, Baojian Zhao, Qifang Zheng, Suhang Zhu, Xiuwen Tang, Jiyang Zhao, Yongchi Luo, Yijia Bai, Zhiqi Xu, Yuchi Su, Wenbo Wang, Wei Zhao, Bing Qu, Lin Xu, Xiaoxiao |
| author_facet | Xu, Mingyu Fang, Cheng Jiang, Keyue Zheng, Yuqian Xiao, Yanghua Zhou, Baojian Zhao, Qifang Zheng, Suhang Zhu, Xiuwen Tang, Jiyang Zhao, Yongchi Luo, Yijia Bai, Zhiqi Xu, Yuchi Su, Wenbo Wang, Wei Zhao, Bing Qu, Lin Xu, Xiaoxiao |
| contents | We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_01562 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement Xu, Mingyu Fang, Cheng Jiang, Keyue Zheng, Yuqian Xiao, Yanghua Zhou, Baojian Zhao, Qifang Zheng, Suhang Zhu, Xiuwen Tang, Jiyang Zhao, Yongchi Luo, Yijia Bai, Zhiqi Xu, Yuchi Su, Wenbo Wang, Wei Zhao, Bing Qu, Lin Xu, Xiaoxiao Artificial Intelligence We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community. |
| title | Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2601.01562 |