Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Pang, Haozhou, Ding, Tianwei, He, Lanshan, Gan, Qi
Formato:	Preprint
Publicado:	2025
Materias:	Graphics Computation and Language Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2503.09645
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917954558361600
author	Pang, Haozhou Ding, Tianwei He, Lanshan Gan, Qi
author_facet	Pang, Haozhou Ding, Tianwei He, Lanshan Gan, Qi
contents	Dance serves as a profound and universal expression of human culture, conveying emotions and stories through movements synchronized with music. Although some current works have achieved satisfactory results in the task of single-person dance generation, the field of multi-person dance generation remains relatively novel. In this work, we present a group choreography framework that leverages recent advancements in Large Language Models (LLM) by modeling the group dance generation problem as a sequence-to-sequence translation task. Our framework consists of a tokenizer that transforms continuous features into discrete tokens, and an LLM that is fine-tuned to predict motion tokens given the audio tokens. We show that by proper tokenization of input modalities and careful design of the LLM training strategies, our framework can generate realistic and diverse group dances while maintaining strong music correlation and dancer-wise consistency. Extensive experiments and evaluations demonstrate that our framework achieves state-of-the-art performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_09645
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Global Position Aware Group Choreography using Large Language Model Pang, Haozhou Ding, Tianwei He, Lanshan Gan, Qi Graphics Computation and Language Computer Vision and Pattern Recognition Dance serves as a profound and universal expression of human culture, conveying emotions and stories through movements synchronized with music. Although some current works have achieved satisfactory results in the task of single-person dance generation, the field of multi-person dance generation remains relatively novel. In this work, we present a group choreography framework that leverages recent advancements in Large Language Models (LLM) by modeling the group dance generation problem as a sequence-to-sequence translation task. Our framework consists of a tokenizer that transforms continuous features into discrete tokens, and an LLM that is fine-tuned to predict motion tokens given the audio tokens. We show that by proper tokenization of input modalities and careful design of the LLM training strategies, our framework can generate realistic and diverse group dances while maintaining strong music correlation and dancer-wise consistency. Extensive experiments and evaluations demonstrate that our framework achieves state-of-the-art performance.
title	Global Position Aware Group Choreography using Large Language Model
topic	Graphics Computation and Language Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.09645

Ejemplares similares