Saved in:
Bibliographic Details
Main Authors: Gao, Li, Yang, Fuzhi, Chen, Jianhui, Liu, Liu, Zheng, Yao, Cai, Yang, Li, Ziqiao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.24021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914420539523072
author Gao, Li
Yang, Fuzhi
Chen, Jianhui
Liu, Liu
Zheng, Yao
Cai, Yang
Li, Ziqiao
author_facet Gao, Li
Yang, Fuzhi
Chen, Jianhui
Liu, Liu
Zheng, Yao
Cai, Yang
Li, Ziqiao
contents Despite significant advances in quadrupedal robotics, a critical gap persists in foundational motion resources that holistically integrate diverse locomotion, emotionally expressive behaviors, and rich language semantics-essential for agile, intuitive human-robot interaction. Current quadruped motion datasets are limited to a few mocap primitives (e.g., walk, trot, sit) and lack diverse behaviors with rich language grounding. To bridge this gap, we introduce Quadruped Foundational Motion (QuadFM) , the first large-scale, ultra-high-fidelity dataset designed for text-to-motion generation and general motion control. QuadFM contains 11,784 curated motion clips spanning locomotion, interactive, and emotion-expressive behaviors (e.g., dancing, stretching, peeing), each with three-layer annotation-fine-grained action labels, interaction scenarios, and natural language commands-totaling 35,352 descriptions to support language-conditioned understanding and command execution. We further propose Gen2Control RL, a unified framework that jointly trains a general motion controller and a text-to-motion generator, enabling efficient end-to-end inference on edge hardware. On a real quadruped robot with an NVIDIA Orin, our system achieves real-time motion synthesis (<500 ms latency). Simulation and real-world results show realistic, diverse motions while maintaining robust physical interaction. The dataset will be released at https://github.com/GaoLii/QuadFM.
format Preprint
id arxiv_https___arxiv_org_abs_2603_24021
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control
Gao, Li
Yang, Fuzhi
Chen, Jianhui
Liu, Liu
Zheng, Yao
Cai, Yang
Li, Ziqiao
Robotics
Despite significant advances in quadrupedal robotics, a critical gap persists in foundational motion resources that holistically integrate diverse locomotion, emotionally expressive behaviors, and rich language semantics-essential for agile, intuitive human-robot interaction. Current quadruped motion datasets are limited to a few mocap primitives (e.g., walk, trot, sit) and lack diverse behaviors with rich language grounding. To bridge this gap, we introduce Quadruped Foundational Motion (QuadFM) , the first large-scale, ultra-high-fidelity dataset designed for text-to-motion generation and general motion control. QuadFM contains 11,784 curated motion clips spanning locomotion, interactive, and emotion-expressive behaviors (e.g., dancing, stretching, peeing), each with three-layer annotation-fine-grained action labels, interaction scenarios, and natural language commands-totaling 35,352 descriptions to support language-conditioned understanding and command execution. We further propose Gen2Control RL, a unified framework that jointly trains a general motion controller and a text-to-motion generator, enabling efficient end-to-end inference on edge hardware. On a real quadruped robot with an NVIDIA Orin, our system achieves real-time motion synthesis (<500 ms latency). Simulation and real-world results show realistic, diverse motions while maintaining robust physical interaction. The dataset will be released at https://github.com/GaoLii/QuadFM.
title QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control
topic Robotics
url https://arxiv.org/abs/2603.24021