Guardado en:
| Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2604.20100 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866917430227369984 |
|---|---|
| author | Zhang, Tianle Yuan, Zhihao Chi, Dafeng Liu, Peidong Li, Dongwei Hu, Kejun Zhang, Likui Nie, Junnan Wei, Ziming Chen, Zengjue Tang, Yili Li, Jiayi Xiang, Zhiyuan Li, Mingyang Luo, Tianci Wan, Hanwen Li, Ao Zhai, Linbo Zhan, Zhihao Bai, Xiaodong Cai, Jiakun Cao, Peng Chen, Kangliang Chen, Siang Dai, Yixiang Di, Shuai Gong, Yicheng Gui, Chenguang Guo, Yucheng Hao, Peng He, Qingrong Huang, Haoyang Huang, Kunrui Huang, Zhixuan Jin, Shibo Jin, Yixiang Li, Anson Li, Dongjiang Li, Jiawei Li, Ruodai Li, Yihang Li, Yuzhen Liang, Jiaming Liu, Fangsheng Long, Jing Luo, Mingxi Pan, Xing Shen, Hui Tian, Xiaomeng Wang, Daming Wang, Song Xiong, Junwu Xu, Hang Xu, Wanting Yu, Zhengcheng Zhang, He Zhang, Jiyao Zhao, Lin Zhou, Chen Duan, Nan Zhuang, Yuzheng Lin, Liang |
| author_facet | Zhang, Tianle Yuan, Zhihao Chi, Dafeng Liu, Peidong Li, Dongwei Hu, Kejun Zhang, Likui Nie, Junnan Wei, Ziming Chen, Zengjue Tang, Yili Li, Jiayi Xiang, Zhiyuan Li, Mingyang Luo, Tianci Wan, Hanwen Li, Ao Zhai, Linbo Zhan, Zhihao Bai, Xiaodong Cai, Jiakun Cao, Peng Chen, Kangliang Chen, Siang Dai, Yixiang Di, Shuai Gong, Yicheng Gui, Chenguang Guo, Yucheng Hao, Peng He, Qingrong Huang, Haoyang Huang, Kunrui Huang, Zhixuan Jin, Shibo Jin, Yixiang Li, Anson Li, Dongjiang Li, Jiawei Li, Ruodai Li, Yihang Li, Yuzhen Liang, Jiaming Liu, Fangsheng Long, Jing Luo, Mingxi Pan, Xing Shen, Hui Tian, Xiaomeng Wang, Daming Wang, Song Xiong, Junwu Xu, Hang Xu, Wanting Yu, Zhengcheng Zhang, He Zhang, Jiyao Zhao, Lin Zhou, Chen Duan, Nan Zhuang, Yuzheng Lin, Liang |
| contents | Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2604_20100 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy Zhang, Tianle Yuan, Zhihao Chi, Dafeng Liu, Peidong Li, Dongwei Hu, Kejun Zhang, Likui Nie, Junnan Wei, Ziming Chen, Zengjue Tang, Yili Li, Jiayi Xiang, Zhiyuan Li, Mingyang Luo, Tianci Wan, Hanwen Li, Ao Zhai, Linbo Zhan, Zhihao Bai, Xiaodong Cai, Jiakun Cao, Peng Chen, Kangliang Chen, Siang Dai, Yixiang Di, Shuai Gong, Yicheng Gui, Chenguang Guo, Yucheng Hao, Peng He, Qingrong Huang, Haoyang Huang, Kunrui Huang, Zhixuan Jin, Shibo Jin, Yixiang Li, Anson Li, Dongjiang Li, Jiawei Li, Ruodai Li, Yihang Li, Yuzhen Liang, Jiaming Liu, Fangsheng Long, Jing Luo, Mingxi Pan, Xing Shen, Hui Tian, Xiaomeng Wang, Daming Wang, Song Xiong, Junwu Xu, Hang Xu, Wanting Yu, Zhengcheng Zhang, He Zhang, Jiyao Zhao, Lin Zhou, Chen Duan, Nan Zhuang, Yuzheng Lin, Liang Robotics Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands. |
| title | JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy |
| topic | Robotics |
| url | https://arxiv.org/abs/2604.20100 |