Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Mingyu, Fang, Cheng, Jiang, Keyue, Zheng, Yuqian, Xiao, Yanghua, Zhou, Baojian, Zhao, Qifang, Zheng, Suhang, Zhu, Xiuwen, Tang, Jiyang, Zhao, Yongchi, Luo, Yijia, Bai, Zhiqi, Xu, Yuchi, Su, Wenbo, Wang, Wei, Zhao, Bing, Qu, Lin, Xu, Xiaoxiao
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.01562
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911386221674496
author	Xu, Mingyu Fang, Cheng Jiang, Keyue Zheng, Yuqian Xiao, Yanghua Zhou, Baojian Zhao, Qifang Zheng, Suhang Zhu, Xiuwen Tang, Jiyang Zhao, Yongchi Luo, Yijia Bai, Zhiqi Xu, Yuchi Su, Wenbo Wang, Wei Zhao, Bing Qu, Lin Xu, Xiaoxiao
author_facet	Xu, Mingyu Fang, Cheng Jiang, Keyue Zheng, Yuqian Xiao, Yanghua Zhou, Baojian Zhao, Qifang Zheng, Suhang Zhu, Xiuwen Tang, Jiyang Zhao, Yongchi Luo, Yijia Bai, Zhiqi Xu, Yuchi Su, Wenbo Wang, Wei Zhao, Bing Qu, Lin Xu, Xiaoxiao
contents	We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_01562
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement Xu, Mingyu Fang, Cheng Jiang, Keyue Zheng, Yuqian Xiao, Yanghua Zhou, Baojian Zhao, Qifang Zheng, Suhang Zhu, Xiuwen Tang, Jiyang Zhao, Yongchi Luo, Yijia Bai, Zhiqi Xu, Yuchi Su, Wenbo Wang, Wei Zhao, Bing Qu, Lin Xu, Xiaoxiao Artificial Intelligence We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.
title	Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement
topic	Artificial Intelligence
url	https://arxiv.org/abs/2601.01562

Similar Items