:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rybkin, Oleh, Nauman, Michal, Fu, Preston, Snell, Charlie, Abbeel, Pieter, Levine, Sergey, Kumar, Aviral
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.04327
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Compute-Optimal Scaling for Value-Based Deep RL
by: Fu, Preston, et al.
Published: (2025)

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
by: Nauman, Michal, et al.
Published: (2025)

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
by: Park, Seohong, et al.
Published: (2023)

floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
by: Agrawalla, Bhavya, et al.
Published: (2025)

Is Value Learning Really the Main Bottleneck in Offline RL?
by: Park, Seohong, et al.
Published: (2024)

Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
by: Ye, Weirui, et al.
Published: (2025)

Reward-Conditioned Reinforcement Learning
by: Nauman, Michal, et al.
Published: (2026)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
by: Snell, Charlie, et al.
Published: (2024)

Scaling Test-Time Compute Without Verification or RL is Suboptimal
by: Setlur, Amrith, et al.
Published: (2025)

Predicting Emergent Capabilities by Finetuning
by: Snell, Charlie, et al.
Published: (2024)

A Stable Whitening Optimizer for Efficient Neural Network Training
by: Frans, Kevin, et al.
Published: (2025)

What Really Matters in Matrix-Whitening Optimizers?
by: Frans, Kevin, et al.
Published: (2025)

Cliqueformer: Model-Based Optimization with Structured Transformers
by: Kuba, Jakub Grudzien, et al.
Published: (2024)

What Does Flow Matching Bring To TD Learning?
by: Agrawalla, Bhavya, et al.
Published: (2026)

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
by: Nakamoto, Mitsuhiko, et al.
Published: (2024)

When Does Non-Uniform Replay Matter in Reinforcement Learning?
by: Korniak, Michal, et al.
Published: (2026)

Diffusion Guidance Is a Controllable Policy Improvement Operator
by: Frans, Kevin, et al.
Published: (2025)

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
by: Frans, Kevin, et al.
Published: (2024)

Horizon Reduction Makes RL Scalable
by: Park, Seohong, et al.
Published: (2025)

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
by: Zhou, Yifei, et al.
Published: (2024)

One Step Diffusion via Shortcut Models
by: Frans, Kevin, et al.
Published: (2024)

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization
by: Kuba, Jakub Grudzien, et al.
Published: (2024)

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
by: Farebrother, Jesse, et al.
Published: (2024)

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
by: Bai, Hao, et al.
Published: (2024)

Prioritized Generative Replay
by: Wang, Renhao, et al.
Published: (2024)

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control
by: Seo, Younggyo, et al.
Published: (2025)

SOMBRL: Scalable and Optimistic Model-Based RL
by: Sukhija, Bhavya, et al.
Published: (2025)

Transitive RL: Value Learning via Divide and Conquer
by: Park, Seohong, et al.
Published: (2025)

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
by: Zhou, Zhiyuan, et al.
Published: (2024)

Vision-Language Models Provide Promptable Representations for Reinforcement Learning
by: Chen, William, et al.
Published: (2024)

Privileged Sensing Scaffolds Reinforcement Learning
by: Hu, Edward S., et al.
Published: (2024)

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
by: Rafailov, Rafael, et al.
Published: (2024)

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)

ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data
by: Dashora, Nitish, et al.
Published: (2025)

RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
by: Wu, Mian, et al.
Published: (2025)

Unfamiliar Finetuning Examples Control How Language Models Hallucinate
by: Kang, Katie, et al.
Published: (2024)

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration
by: Kim, Dongyoung, et al.
Published: (2023)

Relative Entropy Pathwise Policy Optimization
by: Voelcker, Claas, et al.
Published: (2025)

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
by: Setlur, Amrith, et al.
Published: (2025)