Saved in:
| Main Authors: | Yuan, Weizhe, Yu, Jane, Jiang, Song, Padthe, Karthik, Li, Yang, Kulikov, Ilia, Cho, Kyunghyun, Wang, Dong, Tian, Yuandong, Weston, Jason E, Li, Xian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.13124 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
System-Level Natural Language Feedback
by: Yuan, Weizhe, et al.
Published: (2023)
by: Yuan, Weizhe, et al.
Published: (2023)
Following Length Constraints in Instructions
by: Yuan, Weizhe, et al.
Published: (2024)
by: Yuan, Weizhe, et al.
Published: (2024)
Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
by: Pang, Richard Yuanzhe, et al.
Published: (2024)
SPICE: Self-Play In Corpus Environments Improves Reasoning
by: Liu, Bo, et al.
Published: (2025)
by: Liu, Bo, et al.
Published: (2025)
LLM Pretraining with Continuous Concepts
by: Tack, Jihoon, et al.
Published: (2025)
by: Tack, Jihoon, et al.
Published: (2025)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Self-Rewarding Language Models
by: Yuan, Weizhe, et al.
Published: (2024)
by: Yuan, Weizhe, et al.
Published: (2024)
Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024)
by: Hao, Shibo, et al.
Published: (2024)
Self-Taught Evaluators
by: Wang, Tianlu, et al.
Published: (2024)
by: Wang, Tianlu, et al.
Published: (2024)
Distilling System 2 into System 1
by: Yu, Ping, et al.
Published: (2024)
by: Yu, Ping, et al.
Published: (2024)
Learning to Reason for Factuality
by: Chen, Xilun, et al.
Published: (2025)
by: Chen, Xilun, et al.
Published: (2025)
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
by: Whitehouse, Chenxi, et al.
Published: (2025)
by: Whitehouse, Chenxi, et al.
Published: (2025)
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
by: Yu, Ping, et al.
Published: (2025)
by: Yu, Ping, et al.
Published: (2025)
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
by: Wu, Tianhao, et al.
Published: (2024)
by: Wu, Tianhao, et al.
Published: (2024)
StepWiser: Stepwise Generative Judges for Wiser Reasoning
by: Xiong, Wei, et al.
Published: (2025)
by: Xiong, Wei, et al.
Published: (2025)
Bridging Offline and Online Reinforcement Learning for LLMs
by: Lanchantin, Jack, et al.
Published: (2025)
by: Lanchantin, Jack, et al.
Published: (2025)
Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
by: Aggarwal, Pranjal, et al.
Published: (2026)
by: Aggarwal, Pranjal, et al.
Published: (2026)
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
by: Saha, Swarnadeep, et al.
Published: (2025)
by: Saha, Swarnadeep, et al.
Published: (2025)
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)
by: Tian, Yuandong
Published: (2024)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation
by: Lin, Zi, et al.
Published: (2025)
by: Lin, Zi, et al.
Published: (2025)
The Majority is not always right: RL training for solution aggregation
by: Zhao, Wenting, et al.
Published: (2025)
by: Zhao, Wenting, et al.
Published: (2025)
An Overview of Large Language Models for Statisticians
by: Ji, Wenlong, et al.
Published: (2025)
by: Ji, Wenlong, et al.
Published: (2025)
Leveraging Implicit Feedback from Deployment Data in Dialogue
by: Pang, Richard Yuanzhe, et al.
Published: (2023)
by: Pang, Richard Yuanzhe, et al.
Published: (2023)
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
by: Aggarwal, Pranjal, et al.
Published: (2025)
by: Aggarwal, Pranjal, et al.
Published: (2025)
Adaptive Decoding via Latent Preference Optimization
by: Dhuliawala, Shehzaad, et al.
Published: (2024)
by: Dhuliawala, Shehzaad, et al.
Published: (2024)
Diverse Preference Optimization
by: Lanchantin, Jack, et al.
Published: (2025)
by: Lanchantin, Jack, et al.
Published: (2025)
Self-Challenging Language Model Agents
by: Zhou, Yifei, et al.
Published: (2025)
by: Zhou, Yifei, et al.
Published: (2025)
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
by: Tao, Leitian, et al.
Published: (2025)
by: Tao, Leitian, et al.
Published: (2025)
Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
by: Chen, Angelica, et al.
Published: (2023)
by: Chen, Angelica, et al.
Published: (2023)
A Non-classification Result for Wild Knots
by: Kulikov, Vadim
Published: (2015)
by: Kulikov, Vadim
Published: (2015)
Self-Improving Pretraining: using post-trained models to pretrain better models
by: Tan, Ellen Xiaoqing, et al.
Published: (2026)
by: Tan, Ellen Xiaoqing, et al.
Published: (2026)
A Brief Introduction to Causal Inference in Machine Learning
by: Cho, Kyunghyun
Published: (2024)
by: Cho, Kyunghyun
Published: (2024)
Machine Learning: a Lecture Note
by: Cho, Kyunghyun
Published: (2025)
by: Cho, Kyunghyun
Published: (2025)
Thinking LLMs: General Instruction Following with Thought Generation
by: Wu, Tianhao, et al.
Published: (2024)
by: Wu, Tianhao, et al.
Published: (2024)
Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild
by: Hu, Wanpeng, et al.
Published: (2025)
by: Hu, Wanpeng, et al.
Published: (2025)
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
by: Lin, Yen-Ting, et al.
Published: (2025)
by: Lin, Yen-Ting, et al.
Published: (2025)
Reasoning in the Wild
by: Thalos, Mariam
Published: (2025)
by: Thalos, Mariam
Published: (2025)
R.I.P.: Better Models by Survival of the Fittest Prompts
by: Yu, Ping, et al.
Published: (2025)
by: Yu, Ping, et al.
Published: (2025)
Can LLMs Reason in the Wild with Programs?
by: Yang, Yuan, et al.
Published: (2024)
by: Yang, Yuan, et al.
Published: (2024)
Similar Items
-
NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks
by: Li, Yang, et al.
Published: (2025) -
System-Level Natural Language Feedback
by: Yuan, Weizhe, et al.
Published: (2023) -
Following Length Constraints in Instructions
by: Yuan, Weizhe, et al.
Published: (2024) -
Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024) -
SPICE: Self-Play In Corpus Environments Improves Reasoning
by: Liu, Bo, et al.
Published: (2025)