Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yao, Wei, Zhang, Hongwen, Sun, Yunlian, Tang, Jinhui
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.01730
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913184508542976
author	Yao, Wei Zhang, Hongwen Sun, Yunlian Tang, Jinhui
author_facet	Yao, Wei Zhang, Hongwen Sun, Yunlian Tang, Jinhui
contents	The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. For this reason, we propose a novel Spatio-Temporal Alignment Fusion (STAF) model. As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module (TCFM). As for spatial mesh-alignment evidence, we extract fine-grained local information through predicted mesh projection on the feature maps. Based on the spatial features, we further introduce a multi-stage adjacent Spatial Alignment Fusion Module (SAFM) to enhance the feature representation of the target frame. In addition to the above, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame. This method can remarkably improve the smoothness of recovery results from video. Extensive experiments on 3DPW, MPII3D, and H36M demonstrate the superiority of STAF. We achieve a state-of-the-art trade-off between precision and smoothness. Our code and more video results are on the project page https://yw0208.github.io/staf/
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_01730
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion Yao, Wei Zhang, Hongwen Sun, Yunlian Tang, Jinhui Computer Vision and Pattern Recognition The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. For this reason, we propose a novel Spatio-Temporal Alignment Fusion (STAF) model. As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module (TCFM). As for spatial mesh-alignment evidence, we extract fine-grained local information through predicted mesh projection on the feature maps. Based on the spatial features, we further introduce a multi-stage adjacent Spatial Alignment Fusion Module (SAFM) to enhance the feature representation of the target frame. In addition to the above, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame. This method can remarkably improve the smoothness of recovery results from video. Extensive experiments on 3DPW, MPII3D, and H36M demonstrate the superiority of STAF. We achieve a state-of-the-art trade-off between precision and smoothness. Our code and more video results are on the project page https://yw0208.github.io/staf/
title	STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2401.01730

Similar Items