Saved in:
Bibliographic Details
Main Authors: Mo, Clinton Ansun, Hu, Kun, Long, Chengjiang, Yuan, Dong, Siu, Wan-Chi, Wang, Zhiyong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.20170
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908469204877312
author Mo, Clinton Ansun
Hu, Kun
Long, Chengjiang
Yuan, Dong
Siu, Wan-Chi
Wang, Zhiyong
author_facet Mo, Clinton Ansun
Hu, Kun
Long, Chengjiang
Yuan, Dong
Siu, Wan-Chi
Wang, Zhiyong
contents Motion skeletons drive 3D character animation by transforming bone hierarchies, but differences in proportions or structure make motion data hard to transfer across skeletons, posing challenges for data-driven motion synthesis. Temporal Point Clouds (TPCs) offer an unstructured, cross-compatible motion representation. Though reversible with skeletons, TPCs mainly serve for compatibility, not for direct motion task learning. Doing so would require data synthesis capabilities for the TPC format, which presents unexplored challenges regarding its unique temporal consistency and point identifiability. Therefore, we propose PUMPS, the primordial autoencoder architecture for TPC data. PUMPS independently reduces frame-wise point clouds into sampleable feature vectors, from which a decoder extracts distinct temporal points using latent Gaussian noise vectors as sampling identifiers. We introduce linear assignment-based point pairing to optimise the TPC reconstruction process, and negate the use of expensive point-wise attention mechanisms in the architecture. Using these latent features, we pre-train a motion synthesis model capable of performing motion prediction, transition generation, and keyframe interpolation. For these pre-training tasks, PUMPS performs remarkably well even without native dataset supervision, matching state-of-the-art performance. When fine-tuned for motion denoising or estimation, PUMPS outperforms many respective methods without deviating from its generalist architecture.
format Preprint
id arxiv_https___arxiv_org_abs_2507_20170
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks
Mo, Clinton Ansun
Hu, Kun
Long, Chengjiang
Yuan, Dong
Siu, Wan-Chi
Wang, Zhiyong
Computer Vision and Pattern Recognition
Motion skeletons drive 3D character animation by transforming bone hierarchies, but differences in proportions or structure make motion data hard to transfer across skeletons, posing challenges for data-driven motion synthesis. Temporal Point Clouds (TPCs) offer an unstructured, cross-compatible motion representation. Though reversible with skeletons, TPCs mainly serve for compatibility, not for direct motion task learning. Doing so would require data synthesis capabilities for the TPC format, which presents unexplored challenges regarding its unique temporal consistency and point identifiability. Therefore, we propose PUMPS, the primordial autoencoder architecture for TPC data. PUMPS independently reduces frame-wise point clouds into sampleable feature vectors, from which a decoder extracts distinct temporal points using latent Gaussian noise vectors as sampling identifiers. We introduce linear assignment-based point pairing to optimise the TPC reconstruction process, and negate the use of expensive point-wise attention mechanisms in the architecture. Using these latent features, we pre-train a motion synthesis model capable of performing motion prediction, transition generation, and keyframe interpolation. For these pre-training tasks, PUMPS performs remarkably well even without native dataset supervision, matching state-of-the-art performance. When fine-tuned for motion denoising or estimation, PUMPS outperforms many respective methods without deviating from its generalist architecture.
title PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2507.20170