Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Sirui, Wang, Ziyin, Wang, Yu-Xiong, Gui, Liang-Yan
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2403.19652
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910006839869440
author	Xu, Sirui Wang, Ziyin Wang, Yu-Xiong Gui, Liang-Yan
author_facet	Xu, Sirui Wang, Ziyin Wang, Yu-Xiong Gui, Liang-Yan
contents	Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers high-level control over interaction semantics, it cannot grasp the intricacies of low-level interaction dynamics. To overcome this issue, we further introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion. By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner. We apply InterDreamer to the BEHAVE and CHAIRS datasets, and our comprehensive experimental analysis demonstrates its capability to generate realistic and coherent interaction sequences that seamlessly align with the text directives.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_19652
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction Xu, Sirui Wang, Ziyin Wang, Yu-Xiong Gui, Liang-Yan Computer Vision and Pattern Recognition Artificial Intelligence Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers high-level control over interaction semantics, it cannot grasp the intricacies of low-level interaction dynamics. To overcome this issue, we further introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion. By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner. We apply InterDreamer to the BEHAVE and CHAIRS datasets, and our comprehensive experimental analysis demonstrates its capability to generate realistic and coherent interaction sequences that seamlessly align with the text directives.
title	InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2403.19652

Similar Items