Saved in:
Bibliographic Details
Main Authors: Schröder, Sarah, Morgenroth, Thekla, Kuhl, Ulrike, Vaquet, Valerie, Paaßen, Benjamin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.06950
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916895225020416
author Schröder, Sarah
Morgenroth, Thekla
Kuhl, Ulrike
Vaquet, Valerie
Paaßen, Benjamin
author_facet Schröder, Sarah
Morgenroth, Thekla
Kuhl, Ulrike
Vaquet, Valerie
Paaßen, Benjamin
contents Large Language Models (LLMs),such as ChatGPT, are increasingly used in research, ranging from simple writing assistance to complex data annotation tasks. Recently, some research has suggested that LLMs may even be able to simulate human psychology and can, hence, replace human participants in psychological studies. We caution against this approach. We provide conceptual arguments against the hypothesis that LLMs simulate human psychology. We then present empiric evidence illustrating our arguments by demonstrating that slight changes to wording that correspond to large changes in meaning lead to notable discrepancies between LLMs' and human responses, even for the recent CENTAUR model that was specifically fine-tuned on psychological responses. Additionally, different LLMs show very different responses to novel items, further illustrating their lack of reliability. We conclude that LLMs do not simulate human psychology and recommend that psychological researchers should treat LLMs as useful but fundamentally unreliable tools that need to be validated against human responses for every new application.
format Preprint
id arxiv_https___arxiv_org_abs_2508_06950
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Large Language Models Do Not Simulate Human Psychology
Schröder, Sarah
Morgenroth, Thekla
Kuhl, Ulrike
Vaquet, Valerie
Paaßen, Benjamin
Artificial Intelligence
Large Language Models (LLMs),such as ChatGPT, are increasingly used in research, ranging from simple writing assistance to complex data annotation tasks. Recently, some research has suggested that LLMs may even be able to simulate human psychology and can, hence, replace human participants in psychological studies. We caution against this approach. We provide conceptual arguments against the hypothesis that LLMs simulate human psychology. We then present empiric evidence illustrating our arguments by demonstrating that slight changes to wording that correspond to large changes in meaning lead to notable discrepancies between LLMs' and human responses, even for the recent CENTAUR model that was specifically fine-tuned on psychological responses. Additionally, different LLMs show very different responses to novel items, further illustrating their lack of reliability. We conclude that LLMs do not simulate human psychology and recommend that psychological researchers should treat LLMs as useful but fundamentally unreliable tools that need to be validated against human responses for every new application.
title Large Language Models Do Not Simulate Human Psychology
topic Artificial Intelligence
url https://arxiv.org/abs/2508.06950