Saved in:
Bibliographic Details
Main Authors: Valassakis, Eugene, Garcia-Hernando, Guillermo
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.15844
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911964216688640
author Valassakis, Eugene
Garcia-Hernando, Guillermo
author_facet Valassakis, Eugene
Garcia-Hernando, Guillermo
contents Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.
format Preprint
id arxiv_https___arxiv_org_abs_2407_15844
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
Valassakis, Eugene
Garcia-Hernando, Guillermo
Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
Robotics
Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.
title HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
Robotics
url https://arxiv.org/abs/2407.15844