_version_ 1866912568980799488
author Huang, Xingyue
Rishabh
Franke, Gregor
Yang, Ziyi
Bai, Jiamu
Bai, Weijie
Bi, Jinhe
Ding, Zifeng
Duan, Yiqun
Fan, Chengyu
Fan, Wendong
Gao, Xin
Guo, Ruohao
He, Yuan
He, Zhuangzhuang
Hu, Xianglong
Johnson, Neil
Li, Bowen
Lin, Fangru
Lin, Siyu
Liu, Tong
Ma, Yunpu
Shen, Hao
Sun, Hao
Wang, Beibei
Wang, Fangyijie
Wang, Hao
Wang, Haoran
Wang, Yang
Wang, Yifeng
Wang, Zhaowei
Wang, Ziyang
Wu, Yifan
Xiao, Zikai
Xie, Chengxing
Yang, Fan
Yang, Junxiao
Ye, Qianshuo
Ye, Ziyu
Zeng, Guangtao
Zhang, Yuwen Ebony
Zhang, Zeyu
Zhu, Zihao
Ghanem, Bernard
Torr, Philip
Li, Guohao
author_facet Huang, Xingyue
Rishabh
Franke, Gregor
Yang, Ziyi
Bai, Jiamu
Bai, Weijie
Bi, Jinhe
Ding, Zifeng
Duan, Yiqun
Fan, Chengyu
Fan, Wendong
Gao, Xin
Guo, Ruohao
He, Yuan
He, Zhuangzhuang
Hu, Xianglong
Johnson, Neil
Li, Bowen
Lin, Fangru
Lin, Siyu
Liu, Tong
Ma, Yunpu
Shen, Hao
Sun, Hao
Wang, Beibei
Wang, Fangyijie
Wang, Hao
Wang, Haoran
Wang, Yang
Wang, Yifeng
Wang, Zhaowei
Wang, Ziyang
Wu, Yifan
Xiao, Zikai
Xie, Chengxing
Yang, Fan
Yang, Junxiao
Ye, Qianshuo
Ye, Ziyu
Zeng, Guangtao
Zhang, Yuwen Ebony
Zhang, Zeyu
Zhu, Zihao
Ghanem, Bernard
Torr, Philip
Li, Guohao
contents Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
format Preprint
id arxiv_https___arxiv_org_abs_2509_03059
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Huang, Xingyue
Rishabh
Franke, Gregor
Yang, Ziyi
Bai, Jiamu
Bai, Weijie
Bi, Jinhe
Ding, Zifeng
Duan, Yiqun
Fan, Chengyu
Fan, Wendong
Gao, Xin
Guo, Ruohao
He, Yuan
He, Zhuangzhuang
Hu, Xianglong
Johnson, Neil
Li, Bowen
Lin, Fangru
Lin, Siyu
Liu, Tong
Ma, Yunpu
Shen, Hao
Sun, Hao
Wang, Beibei
Wang, Fangyijie
Wang, Hao
Wang, Haoran
Wang, Yang
Wang, Yifeng
Wang, Zhaowei
Wang, Ziyang
Wu, Yifan
Xiao, Zikai
Xie, Chengxing
Yang, Fan
Yang, Junxiao
Ye, Qianshuo
Ye, Ziyu
Zeng, Guangtao
Zhang, Yuwen Ebony
Zhang, Zeyu
Zhu, Zihao
Ghanem, Bernard
Torr, Philip
Li, Guohao
Machine Learning
Artificial Intelligence
Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.
title Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2509.03059