Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Peng, Ge, Haoyang, Qi, Chuan, Huang, Cong, Li, Hong, Zhao, Jiang, Chi, Pei, Chen, Kai
Format:	Preprint
Published:	2026
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2603.10675
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914386274156544
author	Ren, Peng Ge, Haoyang Qi, Chuan Huang, Cong Li, Hong Zhao, Jiang Chi, Pei Chen, Kai
author_facet	Ren, Peng Ge, Haoyang Qi, Chuan Huang, Cong Li, Hong Zhao, Jiang Chi, Pei Chen, Kai
contents	Robots are increasingly expected to execute open ended natural language requests in human environments, which demands reliable long horizon execution under partial observability. This is especially challenging for humanoids because locomotion and manipulation are tightly coupled through stance, reachability, and balance. We present a humanoid agent framework that turns VLM plans into verifiable task programs and closes the loop with multi object 3D geometric supervision. A VLM planner compiles each instruction into a typed JSON sequence of subtasks with explicit predicate based preconditions and success conditions. Using SAM3 and RGB-D, we ground all task relevant entities in 3D, estimate object centroids and extents, and evaluate predicates over stable frames to obtain condition level diagnostics. The supervisor uses these diagnostics to verify subtask completion and to provide condition-level feedback for progression and replanning. We execute each subtask by coordinating humanoid locomotion and whole-body manipulation, selecting feasible motion primitives under reachability and balance constraints. Experiments on tabletop manipulation and long horizon humanoid loco manipulation tasks show improved robustness from multi object grounding, temporal stability, and recovery driven replanning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_10675
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-Manipulation Ren, Peng Ge, Haoyang Qi, Chuan Huang, Cong Li, Hong Zhao, Jiang Chi, Pei Chen, Kai Robotics Robots are increasingly expected to execute open ended natural language requests in human environments, which demands reliable long horizon execution under partial observability. This is especially challenging for humanoids because locomotion and manipulation are tightly coupled through stance, reachability, and balance. We present a humanoid agent framework that turns VLM plans into verifiable task programs and closes the loop with multi object 3D geometric supervision. A VLM planner compiles each instruction into a typed JSON sequence of subtasks with explicit predicate based preconditions and success conditions. Using SAM3 and RGB-D, we ground all task relevant entities in 3D, estimate object centroids and extents, and evaluate predicates over stable frames to obtain condition level diagnostics. The supervisor uses these diagnostics to verify subtask completion and to provide condition-level feedback for progression and replanning. We execute each subtask by coordinating humanoid locomotion and whole-body manipulation, selecting feasible motion primitives under reachability and balance constraints. Experiments on tabletop manipulation and long horizon humanoid loco manipulation tasks show improved robustness from multi object grounding, temporal stability, and recovery driven replanning.
title	Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-Manipulation
topic	Robotics
url	https://arxiv.org/abs/2603.10675

Similar Items