Saved in:
Bibliographic Details
Main Authors: Shen, Judy Hanwen, Guestrin, Carlos
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.06549
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915303023181824
author Shen, Judy Hanwen
Guestrin, Carlos
author_facet Shen, Judy Hanwen
Guestrin, Carlos
contents Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.
format Preprint
id arxiv_https___arxiv_org_abs_2504_06549
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Shen, Judy Hanwen
Guestrin, Carlos
Computers and Society
Artificial Intelligence
Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.
title Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
topic Computers and Society
Artificial Intelligence
url https://arxiv.org/abs/2504.06549