Saved in:
Bibliographic Details
Main Authors: Wang, Puyue, Hu, Jiawei, Gao, Yan, Wang, Junyan, Zhang, Yu, Dobbie, Gillian, Gu, Tao, Johal, Wafa, Dang, Ting, Jia, Hong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04412
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908887089676288
author Wang, Puyue
Hu, Jiawei
Gao, Yan
Wang, Junyan
Zhang, Yu
Dobbie, Gillian
Gu, Tao
Johal, Wafa
Dang, Ting
Jia, Hong
author_facet Wang, Puyue
Hu, Jiawei
Gao, Yan
Wang, Junyan
Zhang, Yu
Dobbie, Gillian
Gu, Tao
Johal, Wafa
Dang, Ting
Jia, Hong
contents Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04412
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation
Wang, Puyue
Hu, Jiawei
Gao, Yan
Wang, Junyan
Zhang, Yu
Dobbie, Gillian
Gu, Tao
Johal, Wafa
Dang, Ting
Jia, Hong
Robotics
Machine Learning
Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.
title HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation
topic Robotics
Machine Learning
url https://arxiv.org/abs/2602.04412