Saved in:
| Main Authors: | Rybkin, Oleh, Nauman, Michal, Fu, Preston, Snell, Charlie, Abbeel, Pieter, Levine, Sergey, Kumar, Aviral |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.04327 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compute-Optimal Scaling for Value-Based Deep RL
by: Fu, Preston, et al.
Published: (2025)
by: Fu, Preston, et al.
Published: (2025)
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
by: Nauman, Michal, et al.
Published: (2025)
by: Nauman, Michal, et al.
Published: (2025)
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
by: Park, Seohong, et al.
Published: (2023)
by: Park, Seohong, et al.
Published: (2023)
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
by: Agrawalla, Bhavya, et al.
Published: (2025)
by: Agrawalla, Bhavya, et al.
Published: (2025)
Is Value Learning Really the Main Bottleneck in Offline RL?
by: Park, Seohong, et al.
Published: (2024)
by: Park, Seohong, et al.
Published: (2024)
Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
by: Ye, Weirui, et al.
Published: (2025)
by: Ye, Weirui, et al.
Published: (2025)
Reward-Conditioned Reinforcement Learning
by: Nauman, Michal, et al.
Published: (2026)
by: Nauman, Michal, et al.
Published: (2026)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
by: Snell, Charlie, et al.
Published: (2024)
by: Snell, Charlie, et al.
Published: (2024)
Scaling Test-Time Compute Without Verification or RL is Suboptimal
by: Setlur, Amrith, et al.
Published: (2025)
by: Setlur, Amrith, et al.
Published: (2025)
Predicting Emergent Capabilities by Finetuning
by: Snell, Charlie, et al.
Published: (2024)
by: Snell, Charlie, et al.
Published: (2024)
A Stable Whitening Optimizer for Efficient Neural Network Training
by: Frans, Kevin, et al.
Published: (2025)
by: Frans, Kevin, et al.
Published: (2025)
What Really Matters in Matrix-Whitening Optimizers?
by: Frans, Kevin, et al.
Published: (2025)
by: Frans, Kevin, et al.
Published: (2025)
Cliqueformer: Model-Based Optimization with Structured Transformers
by: Kuba, Jakub Grudzien, et al.
Published: (2024)
by: Kuba, Jakub Grudzien, et al.
Published: (2024)
What Does Flow Matching Bring To TD Learning?
by: Agrawalla, Bhavya, et al.
Published: (2026)
by: Agrawalla, Bhavya, et al.
Published: (2026)
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
by: Nakamoto, Mitsuhiko, et al.
Published: (2024)
by: Nakamoto, Mitsuhiko, et al.
Published: (2024)
When Does Non-Uniform Replay Matter in Reinforcement Learning?
by: Korniak, Michal, et al.
Published: (2026)
by: Korniak, Michal, et al.
Published: (2026)
Diffusion Guidance Is a Controllable Policy Improvement Operator
by: Frans, Kevin, et al.
Published: (2025)
by: Frans, Kevin, et al.
Published: (2025)
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
by: Frans, Kevin, et al.
Published: (2024)
by: Frans, Kevin, et al.
Published: (2024)
Horizon Reduction Makes RL Scalable
by: Park, Seohong, et al.
Published: (2025)
by: Park, Seohong, et al.
Published: (2025)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024)
by: Zhou, Yifei, et al.
Published: (2024)
One Step Diffusion via Shortcut Models
by: Frans, Kevin, et al.
Published: (2024)
by: Frans, Kevin, et al.
Published: (2024)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization
by: Kuba, Jakub Grudzien, et al.
Published: (2024)
by: Kuba, Jakub Grudzien, et al.
Published: (2024)
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025)
by: Bai, Hao, et al.
Published: (2025)
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
by: Farebrother, Jesse, et al.
Published: (2024)
by: Farebrother, Jesse, et al.
Published: (2024)
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
by: Bai, Hao, et al.
Published: (2024)
by: Bai, Hao, et al.
Published: (2024)
Prioritized Generative Replay
by: Wang, Renhao, et al.
Published: (2024)
by: Wang, Renhao, et al.
Published: (2024)
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
by: Seo, Younggyo, et al.
Published: (2025)
by: Seo, Younggyo, et al.
Published: (2025)
SOMBRL: Scalable and Optimistic Model-Based RL
by: Sukhija, Bhavya, et al.
Published: (2025)
by: Sukhija, Bhavya, et al.
Published: (2025)
Transitive RL: Value Learning via Divide and Conquer
by: Park, Seohong, et al.
Published: (2025)
by: Park, Seohong, et al.
Published: (2025)
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
by: Zhou, Zhiyuan, et al.
Published: (2024)
by: Zhou, Zhiyuan, et al.
Published: (2024)
Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Privileged Sensing Scaffolds Reinforcement Learning
by: Hu, Edward S., et al.
Published: (2024)
by: Hu, Edward S., et al.
Published: (2024)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data
by: Dashora, Nitish, et al.
Published: (2025)
by: Dashora, Nitish, et al.
Published: (2025)
RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
by: Wu, Mian, et al.
Published: (2025)
by: Wu, Mian, et al.
Published: (2025)
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
by: Kang, Katie, et al.
Published: (2024)
by: Kang, Katie, et al.
Published: (2024)
Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration
by: Kim, Dongyoung, et al.
Published: (2023)
by: Kim, Dongyoung, et al.
Published: (2023)
Relative Entropy Pathwise Policy Optimization
by: Voelcker, Claas, et al.
Published: (2025)
by: Voelcker, Claas, et al.
Published: (2025)
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
by: Setlur, Amrith, et al.
Published: (2025)
by: Setlur, Amrith, et al.
Published: (2025)
Similar Items
-
Compute-Optimal Scaling for Value-Based Deep RL
by: Fu, Preston, et al.
Published: (2025) -
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
by: Nauman, Michal, et al.
Published: (2025) -
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
by: Park, Seohong, et al.
Published: (2023) -
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
by: Agrawalla, Bhavya, et al.
Published: (2025) -
Is Value Learning Really the Main Bottleneck in Offline RL?
by: Park, Seohong, et al.
Published: (2024)