Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Jiawei, Xu, Ruoxi, Cao, Boxi, Pan, Ruotong, Zhang, Yunfei, Hu, Yifei, Du, Yong, Gao, Tingting, Lu, Yaojie, Sun, Yingfei, Han, Xianpei, Sun, Le, Wu, Xiangyu, Lin, Hongyu
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2604.08362
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917518725087232
author	Chen, Jiawei Xu, Ruoxi Cao, Boxi Pan, Ruotong Zhang, Yunfei Hu, Yifei Du, Yong Gao, Tingting Lu, Yaojie Sun, Yingfei Han, Xianpei Sun, Le Wu, Xiangyu Lin, Hongyu
author_facet	Chen, Jiawei Xu, Ruoxi Cao, Boxi Pan, Ruotong Zhang, Yunfei Hu, Yifei Du, Yong Gao, Tingting Lu, Yaojie Sun, Yingfei Han, Xianpei Sun, Le Wu, Xiangyu Lin, Hongyu
contents	The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior. To bridge this gap, we introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework. Based on this benchmark, we first provide empirical evidence that previous datasets with isolated scenarios suffer from tunnel vision, whereas real-world decision-making relies on long-term, cross-scenario causal chains. Extensive evaluations of state-of-the-art LLMs reveal that current models struggle to accurately simulate these complex behaviors, with performance plateauing even as context windows expand. Crucially, a systematic comparison between simulated and authentic behaviors uncovers a fundamental structural bias: LLMs tend to converge toward a positive average person, exhibiting hyper-activity, persona homogenization, and a utopian bias. This results in the loss of individual differences and long-tail behaviors, highlighting critical directions for future high-fidelity simulation research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_08362
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Chen, Jiawei Xu, Ruoxi Cao, Boxi Pan, Ruotong Zhang, Yunfei Hu, Yifei Du, Yong Gao, Tingting Lu, Yaojie Sun, Yingfei Han, Xianpei Sun, Le Wu, Xiangyu Lin, Hongyu Computation and Language Artificial Intelligence Machine Learning The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior. To bridge this gap, we introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework. Based on this benchmark, we first provide empirical evidence that previous datasets with isolated scenarios suffer from tunnel vision, whereas real-world decision-making relies on long-term, cross-scenario causal chains. Extensive evaluations of state-of-the-art LLMs reveal that current models struggle to accurately simulate these complex behaviors, with performance plateauing even as context windows expand. Crucially, a systematic comparison between simulated and authentic behaviors uncovers a fundamental structural bias: LLMs tend to converge toward a positive average person, exhibiting hyper-activity, persona homogenization, and a utopian bias. This results in the loss of individual differences and long-tail behaviors, highlighting critical directions for future high-fidelity simulation research.
title	Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2604.08362

Similar Items