Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kim, Edward, Isozaki, Isamu, Sirkin, Naomi, Robson, Michael
Format:	Preprint
Published:	2023
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2307.01898
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929234545475584
author	Kim, Edward Isozaki, Isamu Sirkin, Naomi Robson, Michael
author_facet	Kim, Edward Isozaki, Isamu Sirkin, Naomi Robson, Michael
contents	We performed a billion locality sensitive hash comparisons between artificially generated data samples to answer the critical question - can we reproduce the results of generative AI models? Reproducibility is one of the pillars of scientific research for verifiability, benchmarking, trust, and transparency. Futhermore, we take this research to the next level by verifying the "correctness" of generative AI output in a non-deterministic, trustless, decentralized network. We generate millions of data samples from a variety of open source diffusion and large language models and describe the procedures and trade-offs between generating more verses less deterministic output. Additionally, we analyze the outputs to provide empirical evidence of different parameterizations of tolerance and error bounds for verification. For our results, we show that with a majority vote between three independent verifiers, we can detect image generated perceptual collisions in generated AI with over 99.89% probability and less than 0.0267% chance of intra-class collision. For large language models (LLMs), we are able to gain 100% consensus using greedy methods or n-way beam searches to generate consensus demonstrated on different LLMs. In the context of generative AI training, we pinpoint and minimize the major sources of stochasticity and present gossip and synchronization training techniques for verifiability. Thus, this work provides a practical, solid foundation for AI verification, reproducibility, and consensus for generative AI applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2307_01898
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Generative Artificial Intelligence Reproducibility and Consensus Kim, Edward Isozaki, Isamu Sirkin, Naomi Robson, Michael Distributed, Parallel, and Cluster Computing We performed a billion locality sensitive hash comparisons between artificially generated data samples to answer the critical question - can we reproduce the results of generative AI models? Reproducibility is one of the pillars of scientific research for verifiability, benchmarking, trust, and transparency. Futhermore, we take this research to the next level by verifying the "correctness" of generative AI output in a non-deterministic, trustless, decentralized network. We generate millions of data samples from a variety of open source diffusion and large language models and describe the procedures and trade-offs between generating more verses less deterministic output. Additionally, we analyze the outputs to provide empirical evidence of different parameterizations of tolerance and error bounds for verification. For our results, we show that with a majority vote between three independent verifiers, we can detect image generated perceptual collisions in generated AI with over 99.89% probability and less than 0.0267% chance of intra-class collision. For large language models (LLMs), we are able to gain 100% consensus using greedy methods or n-way beam searches to generate consensus demonstrated on different LLMs. In the context of generative AI training, we pinpoint and minimize the major sources of stochasticity and present gossip and synchronization training techniques for verifiability. Thus, this work provides a practical, solid foundation for AI verification, reproducibility, and consensus for generative AI applications.
title	Generative Artificial Intelligence Reproducibility and Consensus
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2307.01898

Similar Items