Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Vu, Hung Anh, Reeves, Galen, Wenger, Emily
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computers and Society
Online Access:	https://arxiv.org/abs/2505.21677
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911188373209088
author	Vu, Hung Anh Reeves, Galen Wenger, Emily
author_facet	Vu, Hung Anh Reeves, Galen Wenger, Emily
contents	The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_21677
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	What happens when generative AI models train recursively on each others' outputs? Vu, Hung Anh Reeves, Galen Wenger, Emily Machine Learning Artificial Intelligence Computers and Society The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.
title	What happens when generative AI models train recursively on each others' outputs?
topic	Machine Learning Artificial Intelligence Computers and Society
url	https://arxiv.org/abs/2505.21677

Similar Items