Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.13911 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914172017573888 |
|---|---|
| author | Kwon, Patrick Chen, Chen Joo, Hanbyul |
| author_facet | Kwon, Patrick Chen, Chen Joo, Hanbyul |
| contents | Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/ |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2410_13911 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction Kwon, Patrick Chen, Chen Joo, Hanbyul Computer Vision and Pattern Recognition Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/ |
| title | GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2410.13911 |