Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Minchul, Ye, Dingqiang, Su, Yiyang, Liu, Feng, Liu, Xiaoming
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.04708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912312980406272
author	Kim, Minchul Ye, Dingqiang Su, Yiyang Liu, Feng Liu, Xiaoming
author_facet	Kim, Minchul Ye, Dingqiang Su, Yiyang Liu, Feng Liu, Xiaoming
contents	Existing human recognition systems often rely on separate, specialized models for face and body analysis, limiting their effectiveness in real-world scenarios where pose, visibility, and context vary widely. This paper introduces SapiensID, a unified model that bridges this gap, achieving robust performance across diverse settings. SapiensID introduces (i) Retina Patch (RP), a dynamic patch generation scheme that adapts to subject scale and ensures consistent tokenization of regions of interest, (ii) a masked recognition model (MRM) that learns from variable token length, and (iii) Semantic Attention Head (SAH), an module that learns pose-invariant representations by pooling features around key body parts. To facilitate training, we introduce WebBody4M, a large-scale dataset capturing diverse poses and scale variations. Extensive experiments demonstrate that SapiensID achieves state-of-the-art results on various body ReID benchmarks, outperforming specialized models in both short-term and long-term scenarios while remaining competitive with dedicated face recognition systems. Furthermore, SapiensID establishes a strong baseline for the newly introduced challenge of Cross Pose-Scale ReID, demonstrating its ability to generalize to complex, real-world conditions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_04708
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SapiensID: Foundation for Human Recognition Kim, Minchul Ye, Dingqiang Su, Yiyang Liu, Feng Liu, Xiaoming Computer Vision and Pattern Recognition Existing human recognition systems often rely on separate, specialized models for face and body analysis, limiting their effectiveness in real-world scenarios where pose, visibility, and context vary widely. This paper introduces SapiensID, a unified model that bridges this gap, achieving robust performance across diverse settings. SapiensID introduces (i) Retina Patch (RP), a dynamic patch generation scheme that adapts to subject scale and ensures consistent tokenization of regions of interest, (ii) a masked recognition model (MRM) that learns from variable token length, and (iii) Semantic Attention Head (SAH), an module that learns pose-invariant representations by pooling features around key body parts. To facilitate training, we introduce WebBody4M, a large-scale dataset capturing diverse poses and scale variations. Extensive experiments demonstrate that SapiensID achieves state-of-the-art results on various body ReID benchmarks, outperforming specialized models in both short-term and long-term scenarios while remaining competitive with dedicated face recognition systems. Furthermore, SapiensID establishes a strong baseline for the newly introduced challenge of Cross Pose-Scale ReID, demonstrating its ability to generalize to complex, real-world conditions.
title	SapiensID: Foundation for Human Recognition
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.04708

Similar Items