_version_ 1866917430227369984
author Zhang, Tianle
Yuan, Zhihao
Chi, Dafeng
Liu, Peidong
Li, Dongwei
Hu, Kejun
Zhang, Likui
Nie, Junnan
Wei, Ziming
Chen, Zengjue
Tang, Yili
Li, Jiayi
Xiang, Zhiyuan
Li, Mingyang
Luo, Tianci
Wan, Hanwen
Li, Ao
Zhai, Linbo
Zhan, Zhihao
Bai, Xiaodong
Cai, Jiakun
Cao, Peng
Chen, Kangliang
Chen, Siang
Dai, Yixiang
Di, Shuai
Gong, Yicheng
Gui, Chenguang
Guo, Yucheng
Hao, Peng
He, Qingrong
Huang, Haoyang
Huang, Kunrui
Huang, Zhixuan
Jin, Shibo
Jin, Yixiang
Li, Anson
Li, Dongjiang
Li, Jiawei
Li, Ruodai
Li, Yihang
Li, Yuzhen
Liang, Jiaming
Liu, Fangsheng
Long, Jing
Luo, Mingxi
Pan, Xing
Shen, Hui
Tian, Xiaomeng
Wang, Daming
Wang, Song
Xiong, Junwu
Xu, Hang
Xu, Wanting
Yu, Zhengcheng
Zhang, He
Zhang, Jiyao
Zhao, Lin
Zhou, Chen
Duan, Nan
Zhuang, Yuzheng
Lin, Liang
author_facet Zhang, Tianle
Yuan, Zhihao
Chi, Dafeng
Liu, Peidong
Li, Dongwei
Hu, Kejun
Zhang, Likui
Nie, Junnan
Wei, Ziming
Chen, Zengjue
Tang, Yili
Li, Jiayi
Xiang, Zhiyuan
Li, Mingyang
Luo, Tianci
Wan, Hanwen
Li, Ao
Zhai, Linbo
Zhan, Zhihao
Bai, Xiaodong
Cai, Jiakun
Cao, Peng
Chen, Kangliang
Chen, Siang
Dai, Yixiang
Di, Shuai
Gong, Yicheng
Gui, Chenguang
Guo, Yucheng
Hao, Peng
He, Qingrong
Huang, Haoyang
Huang, Kunrui
Huang, Zhixuan
Jin, Shibo
Jin, Yixiang
Li, Anson
Li, Dongjiang
Li, Jiawei
Li, Ruodai
Li, Yihang
Li, Yuzhen
Liang, Jiaming
Liu, Fangsheng
Long, Jing
Luo, Mingxi
Pan, Xing
Shen, Hui
Tian, Xiaomeng
Wang, Daming
Wang, Song
Xiong, Junwu
Xu, Hang
Xu, Wanting
Yu, Zhengcheng
Zhang, He
Zhang, Jiyao
Zhao, Lin
Zhou, Chen
Duan, Nan
Zhuang, Yuzheng
Lin, Liang
contents Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands.
format Preprint
id arxiv_https___arxiv_org_abs_2604_20100
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
Zhang, Tianle
Yuan, Zhihao
Chi, Dafeng
Liu, Peidong
Li, Dongwei
Hu, Kejun
Zhang, Likui
Nie, Junnan
Wei, Ziming
Chen, Zengjue
Tang, Yili
Li, Jiayi
Xiang, Zhiyuan
Li, Mingyang
Luo, Tianci
Wan, Hanwen
Li, Ao
Zhai, Linbo
Zhan, Zhihao
Bai, Xiaodong
Cai, Jiakun
Cao, Peng
Chen, Kangliang
Chen, Siang
Dai, Yixiang
Di, Shuai
Gong, Yicheng
Gui, Chenguang
Guo, Yucheng
Hao, Peng
He, Qingrong
Huang, Haoyang
Huang, Kunrui
Huang, Zhixuan
Jin, Shibo
Jin, Yixiang
Li, Anson
Li, Dongjiang
Li, Jiawei
Li, Ruodai
Li, Yihang
Li, Yuzhen
Liang, Jiaming
Liu, Fangsheng
Long, Jing
Luo, Mingxi
Pan, Xing
Shen, Hui
Tian, Xiaomeng
Wang, Daming
Wang, Song
Xiong, Junwu
Xu, Hang
Xu, Wanting
Yu, Zhengcheng
Zhang, He
Zhang, Jiyao
Zhao, Lin
Zhou, Chen
Duan, Nan
Zhuang, Yuzheng
Lin, Liang
Robotics
Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands.
title JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
topic Robotics
url https://arxiv.org/abs/2604.20100