Saved in:
Bibliographic Details
Main Authors: Xu, Mingyu, Fang, Cheng, Jiang, Keyue, Zheng, Yuqian, Xiao, Yanghua, Zhou, Baojian, Zhao, Qifang, Zheng, Suhang, Zhu, Xiuwen, Tang, Jiyang, Zhao, Yongchi, Luo, Yijia, Bai, Zhiqi, Xu, Yuchi, Su, Wenbo, Wang, Wei, Zhao, Bing, Qu, Lin, Xu, Xiaoxiao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.01562
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911386221674496
author Xu, Mingyu
Fang, Cheng
Jiang, Keyue
Zheng, Yuqian
Xiao, Yanghua
Zhou, Baojian
Zhao, Qifang
Zheng, Suhang
Zhu, Xiuwen
Tang, Jiyang
Zhao, Yongchi
Luo, Yijia
Bai, Zhiqi
Xu, Yuchi
Su, Wenbo
Wang, Wei
Zhao, Bing
Qu, Lin
Xu, Xiaoxiao
author_facet Xu, Mingyu
Fang, Cheng
Jiang, Keyue
Zheng, Yuqian
Xiao, Yanghua
Zhou, Baojian
Zhao, Qifang
Zheng, Suhang
Zhu, Xiuwen
Tang, Jiyang
Zhao, Yongchi
Luo, Yijia
Bai, Zhiqi
Xu, Yuchi
Su, Wenbo
Wang, Wei
Zhao, Bing
Qu, Lin
Xu, Xiaoxiao
contents We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.
format Preprint
id arxiv_https___arxiv_org_abs_2601_01562
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement
Xu, Mingyu
Fang, Cheng
Jiang, Keyue
Zheng, Yuqian
Xiao, Yanghua
Zhou, Baojian
Zhao, Qifang
Zheng, Suhang
Zhu, Xiuwen
Tang, Jiyang
Zhao, Yongchi
Luo, Yijia
Bai, Zhiqi
Xu, Yuchi
Su, Wenbo
Wang, Wei
Zhao, Bing
Qu, Lin
Xu, Xiaoxiao
Artificial Intelligence
We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.
title Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement
topic Artificial Intelligence
url https://arxiv.org/abs/2601.01562