Gespeichert in:
| Hauptverfasser: | , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2025
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2502.13124 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866911252522991616 |
|---|---|
| author | Yuan, Weizhe Yu, Jane Jiang, Song Padthe, Karthik Li, Yang Kulikov, Ilia Cho, Kyunghyun Wang, Dong Tian, Yuandong Weston, Jason E Li, Xian |
| author_facet | Yuan, Weizhe Yu, Jane Jiang, Song Padthe, Karthik Li, Yang Kulikov, Ilia Cho, Kyunghyun Wang, Dong Tian, Yuandong Weston, Jason E Li, Xian |
| contents | Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding. To foster future work, we publicly release NaturalReasoning at https://huggingface.co/datasets/facebook/natural_reasoning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_13124 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions Yuan, Weizhe Yu, Jane Jiang, Song Padthe, Karthik Li, Yang Kulikov, Ilia Cho, Kyunghyun Wang, Dong Tian, Yuandong Weston, Jason E Li, Xian Computation and Language Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding. To foster future work, we publicly release NaturalReasoning at https://huggingface.co/datasets/facebook/natural_reasoning. |
| title | NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2502.13124 |