Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Lei, Dai, Angela
Format:	Preprint
Published:	2025
Subjects:	Graphics Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.07209
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913145214205952
author	Li, Lei Dai, Angela
author_facet	Li, Lei Dai, Angela
contents	We present HOI-PAGE, a new approach that prioritizes part-level affordance reasoning to generate high-fidelity 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion. In contrast to prior works that focus on global, whole body-object motion synthesis, our approach explicitly reasons about the underlying part-level mechanics of interactions using large language models (LLMs). We capture this reasoning in a structured part affordance graph (PAG) representation, serving as a high-level interaction scaffolding to guide a three-stage synthesis: first, decomposing input 3D objects into semantic parts; then, generating reference HOI videos from text prompts to extract part-based motion constraints; and finally, optimizing for 4D HOI motion sequences that mimic the reference dynamics while satisfying part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_07209
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance Li, Lei Dai, Angela Graphics Computer Vision and Pattern Recognition We present HOI-PAGE, a new approach that prioritizes part-level affordance reasoning to generate high-fidelity 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion. In contrast to prior works that focus on global, whole body-object motion synthesis, our approach explicitly reasons about the underlying part-level mechanics of interactions using large language models (LLMs). We capture this reasoning in a structured part affordance graph (PAG) representation, serving as a high-level interaction scaffolding to guide a three-stage synthesis: first, decomposing input 3D objects into semantic parts; then, generating reference HOI videos from text prompts to extract part-based motion constraints; and finally, optimizing for 4D HOI motion sequences that mimic the reference dynamics while satisfying part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation.
title	HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance
topic	Graphics Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.07209

Similar Items