Saved in:
Bibliographic Details
Main Author: Austin, Derek
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.01447
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914441578151936
author Austin, Derek
author_facet Austin, Derek
contents Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. To disentangle pose estimation quality from body model representational capacity, we perform two controlled ablations: translating SAM-3D-Body meshes to SMPL-X, and translating the original dataset's SMPL poses into MHR both retrained under identical conditions. These ablations confirm that body model expressiveness has been a primary bottleneck in avatar reconstruction, with both mesh representational capacity and pose estimation quality contributing meaningfully to the full pipeline's gains.
format Preprint
id arxiv_https___arxiv_org_abs_2604_01447
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars
Austin, Derek
Computer Vision and Pattern Recognition
Artificial Intelligence
Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. To disentangle pose estimation quality from body model representational capacity, we perform two controlled ablations: translating SAM-3D-Body meshes to SMPL-X, and translating the original dataset's SMPL poses into MHR both retrained under identical conditions. These ablations confirm that body model expressiveness has been a primary bottleneck in avatar reconstruction, with both mesh representational capacity and pose estimation quality contributing meaningfully to the full pipeline's gains.
title Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2604.01447