Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.07209 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913145214205952 |
|---|---|
| author | Li, Lei Dai, Angela |
| author_facet | Li, Lei Dai, Angela |
| contents | We present HOI-PAGE, a new approach that prioritizes part-level affordance reasoning to generate high-fidelity 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion. In contrast to prior works that focus on global, whole body-object motion synthesis, our approach explicitly reasons about the underlying part-level mechanics of interactions using large language models (LLMs). We capture this reasoning in a structured part affordance graph (PAG) representation, serving as a high-level interaction scaffolding to guide a three-stage synthesis: first, decomposing input 3D objects into semantic parts; then, generating reference HOI videos from text prompts to extract part-based motion constraints; and finally, optimizing for 4D HOI motion sequences that mimic the reference dynamics while satisfying part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2506_07209 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance Li, Lei Dai, Angela Graphics Computer Vision and Pattern Recognition We present HOI-PAGE, a new approach that prioritizes part-level affordance reasoning to generate high-fidelity 4D human-object interactions (HOIs) from text prompts in a zero-shot fashion. In contrast to prior works that focus on global, whole body-object motion synthesis, our approach explicitly reasons about the underlying part-level mechanics of interactions using large language models (LLMs). We capture this reasoning in a structured part affordance graph (PAG) representation, serving as a high-level interaction scaffolding to guide a three-stage synthesis: first, decomposing input 3D objects into semantic parts; then, generating reference HOI videos from text prompts to extract part-based motion constraints; and finally, optimizing for 4D HOI motion sequences that mimic the reference dynamics while satisfying part-level contact constraints. Extensive experiments show that our approach is flexible and capable of generating complex multi-object or multi-person interaction sequences, with significantly improved realism and text alignment for zero-shot 4D HOI generation. |
| title | HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance |
| topic | Graphics Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2506.07209 |