Saved in:
| Main Authors: | Zhou, Allan, Finn, Chelsea, Harrison, James |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.05232 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse
by: Dong, Perry, et al.
Published: (2026)
by: Dong, Perry, et al.
Published: (2026)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
by: Hsu, Sheryl, et al.
Published: (2024)
by: Hsu, Sheryl, et al.
Published: (2024)
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
by: Xie, Johnathan, et al.
Published: (2024)
by: Xie, Johnathan, et al.
Published: (2024)
EXPO: Stable Reinforcement Learning with Expressive Policies
by: Dong, Perry, et al.
Published: (2025)
by: Dong, Perry, et al.
Published: (2025)
FASTER: Value-Guided Sampling for Fast RL
by: Dong, Perry, et al.
Published: (2026)
by: Dong, Perry, et al.
Published: (2026)
Reinforcement Learning via Implicit Imitation Guidance
by: Dong, Perry, et al.
Published: (2025)
by: Dong, Perry, et al.
Published: (2025)
MemER: Scaling Up Memory for Robot Control via Experience Retrieval
by: Sridhar, Ajay, et al.
Published: (2025)
by: Sridhar, Ajay, et al.
Published: (2025)
Learning Long-Context Diffusion Policies via Past-Token Prediction
by: Torne, Marcel, et al.
Published: (2025)
by: Torne, Marcel, et al.
Published: (2025)
Value Flows
by: Dong, Perry, et al.
Published: (2025)
by: Dong, Perry, et al.
Published: (2025)
Curating Demonstrations using Online Experience
by: Chen, Annie S., et al.
Published: (2025)
by: Chen, Annie S., et al.
Published: (2025)
Efficient Data Collection for Robotic Manipulation via Compositional Generalization
by: Gao, Jensen, et al.
Published: (2024)
by: Gao, Jensen, et al.
Published: (2024)
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
by: Wagenmaker, Andrew, et al.
Published: (2025)
by: Wagenmaker, Andrew, et al.
Published: (2025)
Polychromic Objectives for Reinforcement Learning
by: Hamid, Jubayer Ibn, et al.
Published: (2025)
by: Hamid, Jubayer Ibn, et al.
Published: (2025)
Conservative Prediction via Data-Driven Confidence Minimization
by: Choi, Caroline, et al.
Published: (2023)
by: Choi, Caroline, et al.
Published: (2023)
Affordance-Guided Reinforcement Learning via Visual Prompting
by: Lee, Olivia Y., et al.
Published: (2024)
by: Lee, Olivia Y., et al.
Published: (2024)
Clarify: Improving Model Robustness With Natural Language Corrections
by: Lee, Yoonho, et al.
Published: (2024)
by: Lee, Yoonho, et al.
Published: (2024)
Calibrating Language Models with Adaptive Temperature Scaling
by: Xie, Johnathan, et al.
Published: (2024)
by: Xie, Johnathan, et al.
Published: (2024)
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
by: Kim, Moo Jin, et al.
Published: (2025)
by: Kim, Moo Jin, et al.
Published: (2025)
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
by: Putta, Pranav, et al.
Published: (2024)
by: Putta, Pranav, et al.
Published: (2024)
Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
by: Xiang, Violet, et al.
Published: (2025)
by: Xiang, Violet, et al.
Published: (2025)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
by: Liu, Yuejiang, et al.
Published: (2024)
by: Liu, Yuejiang, et al.
Published: (2024)
Contrastive Preference Learning: Learning from Human Feedback without RL
by: Hejna, Joey, et al.
Published: (2023)
by: Hejna, Joey, et al.
Published: (2023)
A Critical Evaluation of AI Feedback for Aligning Large Language Models
by: Sharma, Archit, et al.
Published: (2024)
by: Sharma, Archit, et al.
Published: (2024)
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
by: Fu, Zipeng, et al.
Published: (2024)
by: Fu, Zipeng, et al.
Published: (2024)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
by: Rafailov, Rafael, et al.
Published: (2023)
by: Rafailov, Rafael, et al.
Published: (2023)
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
by: Nakamoto, Mitsuhiko, et al.
Published: (2023)
RLVF: Learning from Verbal Feedback without Overgeneralization
by: Stephan, Moritz, et al.
Published: (2024)
by: Stephan, Moritz, et al.
Published: (2024)
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
by: Qu, Yuxiao, et al.
Published: (2025)
by: Qu, Yuxiao, et al.
Published: (2025)
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models
by: Wu, Qi, et al.
Published: (2024)
by: Wu, Qi, et al.
Published: (2024)
Deriving Neural Scaling Laws from the statistics of natural language
by: Cagnetta, Francesco, et al.
Published: (2026)
by: Cagnetta, Francesco, et al.
Published: (2026)
Target-Aligned Reinforcement Learning
by: Pleiss, Leonard S., et al.
Published: (2026)
by: Pleiss, Leonard S., et al.
Published: (2026)
Yell At Your Robot: Improving On-the-Fly from Language Corrections
by: Shi, Lucy Xiaoyang, et al.
Published: (2024)
by: Shi, Lucy Xiaoyang, et al.
Published: (2024)
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
by: Liu, Yuejiang, et al.
Published: (2026)
by: Liu, Yuejiang, et al.
Published: (2026)
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
HumanPlus: Humanoid Shadowing and Imitation from Humans
by: Fu, Zipeng, et al.
Published: (2024)
by: Fu, Zipeng, et al.
Published: (2024)
Neural Green's Functions
by: Yoo, Seungwoo, et al.
Published: (2025)
by: Yoo, Seungwoo, et al.
Published: (2025)
Universal Value-Function Uncertainties
by: Zanger, Moritz A., et al.
Published: (2025)
by: Zanger, Moritz A., et al.
Published: (2025)
Towards Universal Neural Likelihood Inference
by: Brahmavar, Shreyas Bhat, et al.
Published: (2025)
by: Brahmavar, Shreyas Bhat, et al.
Published: (2025)
Similar Items
-
TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse
by: Dong, Perry, et al.
Published: (2026) -
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
by: Hsu, Sheryl, et al.
Published: (2024) -
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
by: Xie, Johnathan, et al.
Published: (2024) -
EXPO: Stable Reinforcement Learning with Expressive Policies
by: Dong, Perry, et al.
Published: (2025) -
FASTER: Value-Guided Sampling for Fast RL
by: Dong, Perry, et al.
Published: (2026)