Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pan, Liang, Yang, Zeshi, Dou, Zhiyang, Wang, Wenjia, Huang, Buzhen, Dai, Bo, Komura, Taku, Wang, Jingbo
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2503.19901
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866915224433459200
author	Pan, Liang Yang, Zeshi Dou, Zhiyang Wang, Wenjia Huang, Buzhen Dai, Bo Komura, Taku Wang, Jingbo
author_facet	Pan, Liang Yang, Zeshi Dou, Zhiyang Wang, Wenjia Huang, Buzhen Dai, Bo Komura, Taku Wang, Jingbo
contents	Synthesizing diverse and physically plausible Human-Scene Interactions (HSI) is pivotal for both computer animation and embodied AI. Despite encouraging progress, current methods mainly focus on developing separate controllers, each specialized for a specific interaction task. This significantly hinders the ability to tackle a wide variety of challenging HSI tasks that require the integration of multiple skills, e.g., sitting down while carrying an object. To address this issue, we present TokenHSI, a single, unified transformer-based policy capable of multi-skill unification and flexible adaptation. The key insight is to model the humanoid proprioception as a separate shared token and combine it with distinct task tokens via a masking mechanism. Such a unified policy enables effective knowledge sharing across skills, thereby facilitating the multi-task training. Moreover, our policy architecture supports variable length inputs, enabling flexible adaptation of learned skills to new scenarios. By training additional task tokenizers, we can not only modify the geometries of interaction targets but also coordinate multiple skills to address complex tasks. The experiments demonstrate that our approach can significantly improve versatility, adaptability, and extensibility in various HSI tasks. Website: https://liangpan99.github.io/TokenHSI/
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_19901
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Pan, Liang Yang, Zeshi Dou, Zhiyang Wang, Wenjia Huang, Buzhen Dai, Bo Komura, Taku Wang, Jingbo Computer Vision and Pattern Recognition Synthesizing diverse and physically plausible Human-Scene Interactions (HSI) is pivotal for both computer animation and embodied AI. Despite encouraging progress, current methods mainly focus on developing separate controllers, each specialized for a specific interaction task. This significantly hinders the ability to tackle a wide variety of challenging HSI tasks that require the integration of multiple skills, e.g., sitting down while carrying an object. To address this issue, we present TokenHSI, a single, unified transformer-based policy capable of multi-skill unification and flexible adaptation. The key insight is to model the humanoid proprioception as a separate shared token and combine it with distinct task tokens via a masking mechanism. Such a unified policy enables effective knowledge sharing across skills, thereby facilitating the multi-task training. Moreover, our policy architecture supports variable length inputs, enabling flexible adaptation of learned skills to new scenarios. By training additional task tokenizers, we can not only modify the geometries of interaction targets but also coordinate multiple skills to address complex tasks. The experiments demonstrate that our approach can significantly improve versatility, adaptability, and extensibility in various HSI tasks. Website: https://liangpan99.github.io/TokenHSI/
title	TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.19901

Ähnliche Einträge