Saved in:
Bibliographic Details
Main Authors: Vu, Hung Anh, Reeves, Galen, Wenger, Emily
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.21677
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911188373209088
author Vu, Hung Anh
Reeves, Galen
Wenger, Emily
author_facet Vu, Hung Anh
Reeves, Galen
Wenger, Emily
contents The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2505_21677
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle What happens when generative AI models train recursively on each others' outputs?
Vu, Hung Anh
Reeves, Galen
Wenger, Emily
Machine Learning
Artificial Intelligence
Computers and Society
The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.
title What happens when generative AI models train recursively on each others' outputs?
topic Machine Learning
Artificial Intelligence
Computers and Society
url https://arxiv.org/abs/2505.21677