Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Schröder, Sarah, Morgenroth, Thekla, Kuhl, Ulrike, Vaquet, Valerie, Paaßen, Benjamin
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.06950
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916895225020416
author	Schröder, Sarah Morgenroth, Thekla Kuhl, Ulrike Vaquet, Valerie Paaßen, Benjamin
author_facet	Schröder, Sarah Morgenroth, Thekla Kuhl, Ulrike Vaquet, Valerie Paaßen, Benjamin
contents	Large Language Models (LLMs),such as ChatGPT, are increasingly used in research, ranging from simple writing assistance to complex data annotation tasks. Recently, some research has suggested that LLMs may even be able to simulate human psychology and can, hence, replace human participants in psychological studies. We caution against this approach. We provide conceptual arguments against the hypothesis that LLMs simulate human psychology. We then present empiric evidence illustrating our arguments by demonstrating that slight changes to wording that correspond to large changes in meaning lead to notable discrepancies between LLMs' and human responses, even for the recent CENTAUR model that was specifically fine-tuned on psychological responses. Additionally, different LLMs show very different responses to novel items, further illustrating their lack of reliability. We conclude that LLMs do not simulate human psychology and recommend that psychological researchers should treat LLMs as useful but fundamentally unreliable tools that need to be validated against human responses for every new application.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_06950
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Large Language Models Do Not Simulate Human Psychology Schröder, Sarah Morgenroth, Thekla Kuhl, Ulrike Vaquet, Valerie Paaßen, Benjamin Artificial Intelligence Large Language Models (LLMs),such as ChatGPT, are increasingly used in research, ranging from simple writing assistance to complex data annotation tasks. Recently, some research has suggested that LLMs may even be able to simulate human psychology and can, hence, replace human participants in psychological studies. We caution against this approach. We provide conceptual arguments against the hypothesis that LLMs simulate human psychology. We then present empiric evidence illustrating our arguments by demonstrating that slight changes to wording that correspond to large changes in meaning lead to notable discrepancies between LLMs' and human responses, even for the recent CENTAUR model that was specifically fine-tuned on psychological responses. Additionally, different LLMs show very different responses to novel items, further illustrating their lack of reliability. We conclude that LLMs do not simulate human psychology and recommend that psychological researchers should treat LLMs as useful but fundamentally unreliable tools that need to be validated against human responses for every new application.
title	Large Language Models Do Not Simulate Human Psychology
topic	Artificial Intelligence
url	https://arxiv.org/abs/2508.06950

Similar Items