Saved in:
| Main Authors: | Shimizu, Yutaka, Hong, Joey, Levine, Sergey, Tomizuka, Masayoshi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04534 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bisimulation metric for Model Predictive Control
by: Shimizu, Yutaka, et al.
Published: (2024)
by: Shimizu, Yutaka, et al.
Published: (2024)
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
by: Hong, Joey, et al.
Published: (2024)
by: Hong, Joey, et al.
Published: (2024)
Adaptive Linear Path Model-Based Diffusion
by: Shimizu, Yutaka, et al.
Published: (2026)
by: Shimizu, Yutaka, et al.
Published: (2026)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
by: Hong, Joey, et al.
Published: (2024)
by: Hong, Joey, et al.
Published: (2024)
Residual Q-Learning: Offline and Online Policy Customization without Value
by: Li, Chenran, et al.
Published: (2023)
by: Li, Chenran, et al.
Published: (2023)
Flow Q-Learning
by: Park, Seohong, et al.
Published: (2025)
by: Park, Seohong, et al.
Published: (2025)
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
by: Hong, Joey, et al.
Published: (2025)
by: Hong, Joey, et al.
Published: (2025)
Q-learning with Adjoint Matching
by: Li, Qiyang, et al.
Published: (2026)
by: Li, Qiyang, et al.
Published: (2026)
Generalizability Analysis of Graph-based Trajectory Predictor with Vectorized Representation
by: Lu, Juanwu, et al.
Published: (2022)
by: Lu, Juanwu, et al.
Published: (2022)
Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025)
by: Bai, Hao, et al.
Published: (2025)
Decoupled Q-Chunking
by: Li, Qiyang, et al.
Published: (2025)
by: Li, Qiyang, et al.
Published: (2025)
Grounded Relational Inference: Domain Knowledge Driven Explainable Autonomous Driving
by: Tang, Chen, et al.
Published: (2021)
by: Tang, Chen, et al.
Published: (2021)
FDPP: Fine-tune Diffusion Policy with Human Preference
by: Chen, Yuxin, et al.
Published: (2025)
by: Chen, Yuxin, et al.
Published: (2025)
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
by: Hao, Ce, et al.
Published: (2023)
by: Hao, Ce, et al.
Published: (2023)
BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay
by: Weaver, Catherine, et al.
Published: (2024)
by: Weaver, Catherine, et al.
Published: (2024)
Zero-Overhead Introspection for Adaptive Test-Time Compute
by: Manvi, Rohin, et al.
Published: (2025)
by: Manvi, Rohin, et al.
Published: (2025)
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
by: Chen, Yuxin, et al.
Published: (2024)
by: Chen, Yuxin, et al.
Published: (2024)
Residual Policy Gradient: A Reward View of KL-regularized Objective
by: Wang, Pengcheng, et al.
Published: (2025)
by: Wang, Pengcheng, et al.
Published: (2025)
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
by: Wang, Qinsi, et al.
Published: (2025)
by: Wang, Qinsi, et al.
Published: (2025)
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Unsupervised-to-Online Reinforcement Learning
by: Kim, Junsu, et al.
Published: (2024)
by: Kim, Junsu, et al.
Published: (2024)
Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment
by: Tian, Ran, et al.
Published: (2024)
by: Tian, Ran, et al.
Published: (2024)
Visual Pre-Training on Unlabeled Images using Reinforcement Learning
by: Ghosh, Dibya, et al.
Published: (2025)
by: Ghosh, Dibya, et al.
Published: (2025)
Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
by: Jalayer, Reza, et al.
Published: (2025)
by: Jalayer, Reza, et al.
Published: (2025)
Aligning Flow Map Policies with Optimal Q-Guidance
by: Ziakas, Christos, et al.
Published: (2026)
by: Ziakas, Christos, et al.
Published: (2026)
DADP: Domain Adaptive Diffusion Policy
by: Wang, Pengcheng, et al.
Published: (2026)
by: Wang, Pengcheng, et al.
Published: (2026)
Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers
by: Shimizu, Atsushi, et al.
Published: (2026)
by: Shimizu, Atsushi, et al.
Published: (2026)
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving
by: Fei, Xin, et al.
Published: (2024)
by: Fei, Xin, et al.
Published: (2024)
Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking
by: Li, Jiachen, et al.
Published: (2021)
by: Li, Jiachen, et al.
Published: (2021)
Mildly Conservative Q-Learning for Offline Reinforcement Learning
by: Lyu, Jiafei, et al.
Published: (2022)
by: Lyu, Jiafei, et al.
Published: (2022)
Bootstrap Off-policy with World Model
by: Zhan, Guojian, et al.
Published: (2025)
by: Zhan, Guojian, et al.
Published: (2025)
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
by: Chang, Wei-Jer, et al.
Published: (2025)
by: Chang, Wei-Jer, et al.
Published: (2025)
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
by: Liang, Zhixuan, et al.
Published: (2023)
by: Liang, Zhixuan, et al.
Published: (2023)
Reinforcement Learning with Action Chunking
by: Li, Qiyang, et al.
Published: (2025)
by: Li, Qiyang, et al.
Published: (2025)
Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning
by: Xie, Yichen, et al.
Published: (2026)
by: Xie, Yichen, et al.
Published: (2026)
Behavioral Exploration: Learning to Explore via In-Context Adaptation
by: Wagenmaker, Andrew, et al.
Published: (2025)
by: Wagenmaker, Andrew, et al.
Published: (2025)
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025)
by: Zhan, Guojian, et al.
Published: (2025)
Peng's Q($λ$) for Conservative Value Estimation in Offline Reinforcement Learning
by: Kim, Byeongchan, et al.
Published: (2026)
by: Kim, Byeongchan, et al.
Published: (2026)
RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes
by: Stachowicz, Kyle, et al.
Published: (2024)
by: Stachowicz, Kyle, et al.
Published: (2024)
Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning
by: Zheng, Bill Chunyuan, et al.
Published: (2025)
by: Zheng, Bill Chunyuan, et al.
Published: (2025)
Similar Items
-
Bisimulation metric for Model Predictive Control
by: Shimizu, Yutaka, et al.
Published: (2024) -
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
by: Hong, Joey, et al.
Published: (2024) -
Adaptive Linear Path Model-Based Diffusion
by: Shimizu, Yutaka, et al.
Published: (2026) -
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
by: Hong, Joey, et al.
Published: (2024) -
Residual Q-Learning: Offline and Online Policy Customization without Value
by: Li, Chenran, et al.
Published: (2023)