Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Runze, Zhang, Xiaowei, Zhao, Mingyang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.03310
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908574983127040
author	Zhang, Runze Zhang, Xiaowei Zhao, Mingyang
author_facet	Zhang, Runze Zhang, Xiaowei Zhao, Mingyang
contents	LLMs are emerging tools for simulating human behavior in business, economics, and social science, offering a lower-cost complement to laboratory experiments, field studies, and surveys. This paper evaluates how well LLMs replicate human behavior in operations management. Using nine published experiments in behavioral operations, we assess two criteria: replication of hypothesis-test outcomes and distributional alignment via Wasserstein distance. LLMs reproduce most hypothesis-level effects, capturing key decision biases, but their response distributions diverge from human data, including for strong commercial models. We also test two lightweight interventions -- chain-of-thought prompting and hyperparameter tuning -- which reduce misalignment and can sometimes let smaller or open-source models match or surpass larger systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_03310
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management Zhang, Runze Zhang, Xiaowei Zhao, Mingyang Machine Learning Artificial Intelligence LLMs are emerging tools for simulating human behavior in business, economics, and social science, offering a lower-cost complement to laboratory experiments, field studies, and surveys. This paper evaluates how well LLMs replicate human behavior in operations management. Using nine published experiments in behavioral operations, we assess two criteria: replication of hypothesis-test outcomes and distributional alignment via Wasserstein distance. LLMs reproduce most hypothesis-level effects, capturing key decision biases, but their response distributions diverge from human data, including for strong commercial models. We also test two lightweight interventions -- chain-of-thought prompting and hyperparameter tuning -- which reduce misalignment and can sometimes let smaller or open-source models match or surpass larger systems.
title	Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2510.03310

Similar Items