Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Xingyue, Rishabh, Franke, Gregor, Yang, Ziyi, Bai, Jiamu, Bai, Weijie, Bi, Jinhe, Ding, Zifeng, Duan, Yiqun, Fan, Chengyu, Fan, Wendong, Gao, Xin, Guo, Ruohao, He, Yuan, He, Zhuangzhuang, Hu, Xianglong, Johnson, Neil, Li, Bowen, Lin, Fangru, Lin, Siyu, Liu, Tong, Ma, Yunpu, Shen, Hao, Sun, Hao, Wang, Beibei, Wang, Fangyijie, Wang, Hao, Wang, Haoran, Wang, Yang, Wang, Yifeng, Wang, Zhaowei, Wang, Ziyang, Wu, Yifan, Xiao, Zikai, Xie, Chengxing, Yang, Fan, Yang, Junxiao, Ye, Qianshuo, Ye, Ziyu, Zeng, Guangtao, Zhang, Yuwen Ebony, Zhang, Zeyu, Zhu, Zihao, Ghanem, Bernard, Torr, Philip, Li, Guohao
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.03059
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912568980799488
author	Huang, Xingyue Rishabh Franke, Gregor Yang, Ziyi Bai, Jiamu Bai, Weijie Bi, Jinhe Ding, Zifeng Duan, Yiqun Fan, Chengyu Fan, Wendong Gao, Xin Guo, Ruohao He, Yuan He, Zhuangzhuang Hu, Xianglong Johnson, Neil Li, Bowen Lin, Fangru Lin, Siyu Liu, Tong Ma, Yunpu Shen, Hao Sun, Hao Wang, Beibei Wang, Fangyijie Wang, Hao Wang, Haoran Wang, Yang Wang, Yifeng Wang, Zhaowei Wang, Ziyang Wu, Yifan Xiao, Zikai Xie, Chengxing Yang, Fan Yang, Junxiao Ye, Qianshuo Ye, Ziyu Zeng, Guangtao Zhang, Yuwen Ebony Zhang, Zeyu Zhu, Zihao Ghanem, Bernard Torr, Philip Li, Guohao
author_facet	Huang, Xingyue Rishabh Franke, Gregor Yang, Ziyi Bai, Jiamu Bai, Weijie Bi, Jinhe Ding, Zifeng Duan, Yiqun Fan, Chengyu Fan, Wendong Gao, Xin Guo, Ruohao He, Yuan He, Zhuangzhuang Hu, Xianglong Johnson, Neil Li, Bowen Lin, Fangru Lin, Siyu Liu, Tong Ma, Yunpu Shen, Hao Sun, Hao Wang, Beibei Wang, Fangyijie Wang, Hao Wang, Haoran Wang, Yang Wang, Yifeng Wang, Zhaowei Wang, Ziyang Wu, Yifan Xiao, Zikai Xie, Chengxing Yang, Fan Yang, Junxiao Ye, Qianshuo Ye, Ziyu Zeng, Guangtao Zhang, Yuwen Ebony Zhang, Zeyu Zhu, Zihao Ghanem, Bernard Torr, Philip Li, Guohao
contents	Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_03059
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Huang, Xingyue Rishabh Franke, Gregor Yang, Ziyi Bai, Jiamu Bai, Weijie Bi, Jinhe Ding, Zifeng Duan, Yiqun Fan, Chengyu Fan, Wendong Gao, Xin Guo, Ruohao He, Yuan He, Zhuangzhuang Hu, Xianglong Johnson, Neil Li, Bowen Lin, Fangru Lin, Siyu Liu, Tong Ma, Yunpu Shen, Hao Sun, Hao Wang, Beibei Wang, Fangyijie Wang, Hao Wang, Haoran Wang, Yang Wang, Yifeng Wang, Zhaowei Wang, Ziyang Wu, Yifan Xiao, Zikai Xie, Chengxing Yang, Fan Yang, Junxiao Ye, Qianshuo Ye, Ziyu Zeng, Guangtao Zhang, Yuwen Ebony Zhang, Zeyu Zhu, Zihao Ghanem, Bernard Torr, Philip Li, Guohao Machine Learning Artificial Intelligence Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
title	Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2509.03059

Similar Items