Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shen, Judy Hanwen, Guestrin, Carlos
Format:	Preprint
Published:	2025
Subjects:	Computers and Society Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.06549
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915303023181824
author	Shen, Judy Hanwen Guestrin, Carlos
author_facet	Shen, Judy Hanwen Guestrin, Carlos
contents	Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_06549
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Societal Impacts Research Requires Benchmarks for Creative Composition Tasks Shen, Judy Hanwen Guestrin, Carlos Computers and Society Artificial Intelligence Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.
title	Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
topic	Computers and Society Artificial Intelligence
url	https://arxiv.org/abs/2504.06549

Similar Items