Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Fan, Ke, Lu, Shunlin, Dai, Minyue, Yu, Runyi, Xiao, Lixing, Dou, Zhiyang, Dong, Junting, Ma, Lizhuang, Wang, Jingbo
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2507.07095
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866915379982368768
author	Fan, Ke Lu, Shunlin Dai, Minyue Yu, Runyi Xiao, Lixing Dou, Zhiyang Dong, Junting Ma, Lizhuang Wang, Jingbo
author_facet	Fan, Ke Lu, Shunlin Dai, Minyue Yu, Runyi Xiao, Lixing Dou, Zhiyang Dong, Junting Ma, Lizhuang Wang, Jingbo
contents	Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion-the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation. The code is available at https://github.com/VankouF/MotionMillion-Codes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_07095
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Fan, Ke Lu, Shunlin Dai, Minyue Yu, Runyi Xiao, Lixing Dou, Zhiyang Dong, Junting Ma, Lizhuang Wang, Jingbo Computer Vision and Pattern Recognition Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion-the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation. The code is available at https://github.com/VankouF/MotionMillion-Codes.
title	Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2507.07095

Ejemplares similares