Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Antunes, Benjamin A.
Format:	Preprint
Published:	2025
Subjects:	Other Computer Science Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2507.03007
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912464919068672
author	Antunes, Benjamin A.
author_facet	Antunes, Benjamin A.
contents	Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators-particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy-are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_03007
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies Antunes, Benjamin A. Other Computer Science Cryptography and Security Machine Learning Machine learning (ML) frameworks rely heavily on pseudorandom number generators (PRNGs) for tasks such as data shuffling, weight initialization, dropout, and optimization. Yet, the statistical quality and reproducibility of these generators-particularly when integrated into frameworks like PyTorch, TensorFlow, and NumPy-are underexplored. In this paper, we compare the statistical quality of PRNGs used in ML frameworks (Mersenne Twister, PCG, and Philox) against their original C implementations. Using the rigorous TestU01 BigCrush test suite, we evaluate 896 independent random streams for each generator. Our findings challenge claims of statistical robustness, revealing that even generators labeled ''crush-resistant'' (e.g., PCG, Philox) may fail certain statistical tests. Surprisingly, we can observe some differences in failure profiles between the native and framework-integrated versions of the same algorithm, highlighting some implementation differences that may exist.
title	Statistical Quality and Reproducibility of Pseudorandom Number Generators in Machine Learning technologies
topic	Other Computer Science Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2507.03007

Similar Items