Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yan, Haodong, Yu, Hang, Zhong, Zhide, Yuan, Weilin, Gong, Xin, Luo, Zehang, Heyu, Chengxi, Li, Junfeng, Song, Wenxuan, Zhou, Shunbo, Li, Haoang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.01677
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909938106761216
author	Yan, Haodong Yu, Hang Zhong, Zhide Yuan, Weilin Gong, Xin Luo, Zehang Heyu, Chengxi Li, Junfeng Song, Wenxuan Zhou, Shunbo Li, Haoang
author_facet	Yan, Haodong Yu, Hang Zhong, Zhide Yuan, Weilin Gong, Xin Luo, Zehang Heyu, Chengxi Li, Junfeng Song, Wenxuan Zhou, Shunbo Li, Haoang
contents	Generating realistic hand-object interactions (HOI) videos is a significant challenge due to the difficulty of modeling physical constraints (e.g., contact and occlusion between hands and manipulated objects). Current methods utilize HOI representation as an auxiliary generative objective to guide video synthesis. However, there is a dilemma between 2D and 3D representations that cannot simultaneously guarantee scalability and interaction fidelity. To address this limitation, we propose a structure and contact-aware representation that captures hand-object contact, hand-object occlusion, and holistic structure context without 3D annotations. This interaction-oriented and scalable supervision signal enables the model to learn fine-grained interaction physics and generalize to open-world scenarios. To fully exploit the proposed representation, we introduce a joint-generation paradigm with a share-and-specialization strategy that generates interaction-oriented representations and videos. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on two real-world datasets in generating physics-realistic and temporally coherent HOI videos. Furthermore, our approach exhibits strong generalization to challenging open-world scenarios, highlighting the benefit of our scalable design. Our project page is https://hgzn258.github.io/SCAR/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_01677
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation Yan, Haodong Yu, Hang Zhong, Zhide Yuan, Weilin Gong, Xin Luo, Zehang Heyu, Chengxi Li, Junfeng Song, Wenxuan Zhou, Shunbo Li, Haoang Computer Vision and Pattern Recognition Generating realistic hand-object interactions (HOI) videos is a significant challenge due to the difficulty of modeling physical constraints (e.g., contact and occlusion between hands and manipulated objects). Current methods utilize HOI representation as an auxiliary generative objective to guide video synthesis. However, there is a dilemma between 2D and 3D representations that cannot simultaneously guarantee scalability and interaction fidelity. To address this limitation, we propose a structure and contact-aware representation that captures hand-object contact, hand-object occlusion, and holistic structure context without 3D annotations. This interaction-oriented and scalable supervision signal enables the model to learn fine-grained interaction physics and generalize to open-world scenarios. To fully exploit the proposed representation, we introduce a joint-generation paradigm with a share-and-specialization strategy that generates interaction-oriented representations and videos. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on two real-world datasets in generating physics-realistic and temporally coherent HOI videos. Furthermore, our approach exhibits strong generalization to challenging open-world scenarios, highlighting the benefit of our scalable design. Our project page is https://hgzn258.github.io/SCAR/.
title	Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.01677

Similar Items