Saved in:
Bibliographic Details
Main Authors: Kwon, Patrick, Chen, Chen, Joo, Hanbyul
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.13911
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914172017573888
author Kwon, Patrick
Chen, Chen
Joo, Hanbyul
author_facet Kwon, Patrick
Chen, Chen
Joo, Hanbyul
contents Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/
format Preprint
id arxiv_https___arxiv_org_abs_2410_13911
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
Kwon, Patrick
Chen, Chen
Joo, Hanbyul
Computer Vision and Pattern Recognition
Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/
title GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2410.13911