Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Austin, Derek
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.01447
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914441578151936
author	Austin, Derek
author_facet	Austin, Derek
contents	Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. To disentangle pose estimation quality from body model representational capacity, we perform two controlled ablations: translating SAM-3D-Body meshes to SMPL-X, and translating the original dataset's SMPL poses into MHR both retrained under identical conditions. These ablations confirm that body model expressiveness has been a primary bottleneck in avatar reconstruction, with both mesh representational capacity and pose estimation quality contributing meaningfully to the full pipeline's gains.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_01447
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars Austin, Derek Computer Vision and Pattern Recognition Artificial Intelligence Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. To disentangle pose estimation quality from body model representational capacity, we perform two controlled ablations: translating SAM-3D-Body meshes to SMPL-X, and translating the original dataset's SMPL poses into MHR both retrained under identical conditions. These ablations confirm that body model expressiveness has been a primary bottleneck in avatar reconstruction, with both mesh representational capacity and pose estimation quality contributing meaningfully to the full pipeline's gains.
title	Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2604.01447

Similar Items