:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shimizu, Yutaka, Hong, Joey, Levine, Sergey, Tomizuka, Masayoshi
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.04534
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bisimulation metric for Model Predictive Control
by: Shimizu, Yutaka, et al.
Published: (2024)

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
by: Hong, Joey, et al.
Published: (2024)

Adaptive Linear Path Model-Based Diffusion
by: Shimizu, Yutaka, et al.
Published: (2026)

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
by: Hong, Joey, et al.
Published: (2024)

Residual Q-Learning: Offline and Online Policy Customization without Value
by: Li, Chenran, et al.
Published: (2023)

Flow Q-Learning
by: Park, Seohong, et al.
Published: (2025)

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
by: Hong, Joey, et al.
Published: (2025)

Q-learning with Adjoint Matching
by: Li, Qiyang, et al.
Published: (2026)

Generalizability Analysis of Graph-based Trajectory Predictor with Vectorized Representation
by: Lu, Juanwu, et al.
Published: (2022)

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents
by: Bai, Hao, et al.
Published: (2025)

Decoupled Q-Chunking
by: Li, Qiyang, et al.
Published: (2025)

Grounded Relational Inference: Domain Knowledge Driven Explainable Autonomous Driving
by: Tang, Chen, et al.
Published: (2021)

FDPP: Fine-tune Diffusion Policy with Human Preference
by: Chen, Yuxin, et al.
Published: (2025)

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
by: Hao, Ce, et al.
Published: (2023)

BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay
by: Weaver, Catherine, et al.
Published: (2024)

Zero-Overhead Introspection for Adaptive Test-Time Compute
by: Manvi, Rohin, et al.
Published: (2025)

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
by: Chen, Yuxin, et al.
Published: (2024)

Residual Policy Gradient: A Reward View of KL-regularized Objective
by: Wang, Pengcheng, et al.
Published: (2025)

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
by: Wang, Qinsi, et al.
Published: (2025)

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
by: Rafailov, Rafael, et al.
Published: (2024)

Unsupervised-to-Online Reinforcement Learning
by: Kim, Junsu, et al.
Published: (2024)

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment
by: Tian, Ran, et al.
Published: (2024)

Visual Pre-Training on Unlabeled Images using Reinforcement Learning
by: Ghosh, Dibya, et al.
Published: (2025)

Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model
by: Jalayer, Reza, et al.
Published: (2025)

Aligning Flow Map Policies with Optimal Q-Guidance
by: Ziakas, Christos, et al.
Published: (2026)

DADP: Domain Adaptive Diffusion Policy
by: Wang, Pengcheng, et al.
Published: (2026)

Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers
by: Shimizu, Atsushi, et al.
Published: (2026)

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving
by: Fei, Xin, et al.
Published: (2024)

Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking
by: Li, Jiachen, et al.
Published: (2021)

Mildly Conservative Q-Learning for Offline Reinforcement Learning
by: Lyu, Jiafei, et al.
Published: (2022)

Bootstrap Off-policy with World Model
by: Zhan, Guojian, et al.
Published: (2025)

LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation
by: Chang, Wei-Jer, et al.
Published: (2025)

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
by: Liang, Zhixuan, et al.
Published: (2023)

Reinforcement Learning with Action Chunking
by: Li, Qiyang, et al.
Published: (2025)

Multi-Camera View Scaling for Data-Efficient Robot Imitation Learning
by: Xie, Yichen, et al.
Published: (2026)

Behavioral Exploration: Learning to Explore via In-Context Adaptation
by: Wagenmaker, Andrew, et al.
Published: (2025)

Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025)

Peng's Q($λ$) for Conservative Value Estimation in Offline Reinforcement Learning
by: Kim, Byeongchan, et al.
Published: (2026)

RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes
by: Stachowicz, Kyle, et al.
Published: (2024)

Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning
by: Zheng, Bill Chunyuan, et al.
Published: (2025)