Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kwon, Patrick, Chen, Chen, Joo, Hanbyul
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.13911
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914172017573888
author	Kwon, Patrick Chen, Chen Joo, Hanbyul
author_facet	Kwon, Patrick Chen, Chen Joo, Hanbyul
contents	Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_13911
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction Kwon, Patrick Chen, Chen Joo, Hanbyul Computer Vision and Pattern Recognition Recent generative models can synthesize high-quality images, but they often fail to generate humans interacting with objects using their hands. This arises mostly from the model's misunderstanding of such interactions and the hardships of synthesizing intricate regions of the body. In this paper, we propose \textbf{GraspDiffusion}, a novel generative method that creates realistic scenes of human-object interaction. Given a 3D object, GraspDiffusion constructs whole-body poses with control over the object's location relative to the human body, which is achieved by separately leveraging the generative priors for body and hand poses, optimizing them into a joint grasping pose. This pose guides the image synthesis to correctly reflect the intended interaction, creating realistic and diverse human-object interaction scenes. We demonstrate that GraspDiffusion can successfully tackle the relatively uninvestigated problem of generating full-bodied human-object interactions while outperforming previous methods. Our project page is available at https://yj7082126.github.io/graspdiffusion/
title	GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2410.13911

Similar Items