Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yuan, Weizhe, Yu, Jane, Jiang, Song, Padthe, Karthik, Li, Yang, Kulikov, Ilia, Cho, Kyunghyun, Wang, Dong, Tian, Yuandong, Weston, Jason E, Li, Xian
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2502.13124
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866911252522991616
author Yuan, Weizhe
Yu, Jane
Jiang, Song
Padthe, Karthik
Li, Yang
Kulikov, Ilia
Cho, Kyunghyun
Wang, Dong
Tian, Yuandong
Weston, Jason E
Li, Xian
author_facet Yuan, Weizhe
Yu, Jane
Jiang, Song
Padthe, Karthik
Li, Yang
Kulikov, Ilia
Cho, Kyunghyun
Wang, Dong
Tian, Yuandong
Weston, Jason E
Li, Xian
contents Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding. To foster future work, we publicly release NaturalReasoning at https://huggingface.co/datasets/facebook/natural_reasoning.
format Preprint
id arxiv_https___arxiv_org_abs_2502_13124
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Yuan, Weizhe
Yu, Jane
Jiang, Song
Padthe, Karthik
Li, Yang
Kulikov, Ilia
Cho, Kyunghyun
Wang, Dong
Tian, Yuandong
Weston, Jason E
Li, Xian
Computation and Language
Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding. To foster future work, we publicly release NaturalReasoning at https://huggingface.co/datasets/facebook/natural_reasoning.
title NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
topic Computation and Language
url https://arxiv.org/abs/2502.13124