:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lehnert, Lucas, Sukhbaatar, Sainbayar, Su, DiJia, Zheng, Qinqing, Mcvay, Paul, Rabbat, Michael, Tian, Yuandong
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.14083
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
by: Su, DiJia, et al.
Published: (2024)

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
by: Su, DiJia, et al.
Published: (2025)

Training Large Language Models to Reason in a Continuous Latent Space
by: Hao, Shibo, et al.
Published: (2024)

GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
by: Su, DiJia, et al.
Published: (2025)

Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
by: Wu, Tianhao, et al.
Published: (2024)

R.I.P.: Better Models by Survival of the Fittest Prompts
by: Yu, Ping, et al.
Published: (2025)

Contextual Position Encoding: Learning to Count What's Important
by: Golovneva, Olga, et al.
Published: (2024)

Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
by: Xu, Jing, et al.
Published: (2023)

Reverse Training to Nurse the Reversal Curse
by: Golovneva, Olga, et al.
Published: (2024)

Towards General-Purpose Model-Free Reinforcement Learning
by: Fujimoto, Scott, et al.
Published: (2025)

Self-Challenging Language Model Agents
by: Zhou, Yifei, et al.
Published: (2025)

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
by: Liu, Yixin, et al.
Published: (2026)

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
by: Zhou, Yifei, et al.
Published: (2025)

Thinking LLMs: General Instruction Following with Thought Generation
by: Wu, Tianhao, et al.
Published: (2024)

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)

Iterative Reasoning Preference Optimization
by: Pang, Richard Yuanzhe, et al.
Published: (2024)

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
by: Wang, Chenyu, et al.
Published: (2025)

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
by: Lin, Yen-Ting, et al.
Published: (2025)

StepWiser: Stepwise Generative Judges for Wiser Reasoning
by: Xiong, Wei, et al.
Published: (2025)

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
by: Rashidinejad, Paria, et al.
Published: (2024)

Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)

Self-Rewarding Language Models
by: Yuan, Weizhe, et al.
Published: (2024)

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)

Multi-Token Attention
by: Golovneva, Olga, et al.
Published: (2025)

The Path Not Taken: RLVR Provably Learns Off the Principals
by: Zhu, Hanqing, et al.
Published: (2025)

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
by: Yu, Ping, et al.
Published: (2025)

Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
by: Mustakim, Nasehatul, et al.
Published: (2026)

Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning
by: Huang, Taoan, et al.
Published: (2023)

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
by: Sikchi, Harshit, et al.
Published: (2023)

LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines
by: Gao, Jiechao, et al.
Published: (2026)

Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)

Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)

Bootstrapping Human-Like Planning via LLMs
by: Porfirio, David, et al.
Published: (2025)

Scalable Option Learning in High-Throughput Environments
by: Henaff, Mikael, et al.
Published: (2025)

SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation
by: Zhang, Xichen, et al.
Published: (2026)

Bootstrapping LLMs via Preference-Based Policy Optimization
by: Jia, Chen
Published: (2025)

Adaptive Decoding via Latent Preference Optimization
by: Dhuliawala, Shehzaad, et al.
Published: (2024)

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
by: Zheng, Qinqing, et al.
Published: (2024)

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
by: Sukhbaatar, Sainbayar, et al.
Published: (2024)