Saved in:
Bibliographic Details
Main Authors: Almeida, Guilherme F. C. F., Nunes, José Luiz, Engelmann, Neele, Wiegmann, Alex, de Araújo, Marcelo
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2308.01264
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914804196704256
author Almeida, Guilherme F. C. F.
Nunes, José Luiz
Engelmann, Neele
Wiegmann, Alex
de Araújo, Marcelo
author_facet Almeida, Guilherme F. C. F.
Nunes, José Luiz
Engelmann, Neele
Wiegmann, Alex
de Araújo, Marcelo
contents Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall alignment, with GPT-4 taking a clear lead over all other models we tested. Nonetheless, even when LLM-generated responses are highly correlated to human responses, there are still systematic differences, with a tendency for models to exaggerate effects that are present among humans, in part by reducing variance. This recommends caution with regards to proposals of replacing human participants with current state-of-the-art LLMs in psychological research and highlights the need for further research about the distinctive aspects of machine psychology.
format Preprint
id arxiv_https___arxiv_org_abs_2308_01264
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Exploring the psychology of LLMs' Moral and Legal Reasoning
Almeida, Guilherme F. C. F.
Nunes, José Luiz
Engelmann, Neele
Wiegmann, Alex
de Araújo, Marcelo
Artificial Intelligence
Computation and Language
Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall alignment, with GPT-4 taking a clear lead over all other models we tested. Nonetheless, even when LLM-generated responses are highly correlated to human responses, there are still systematic differences, with a tendency for models to exaggerate effects that are present among humans, in part by reducing variance. This recommends caution with regards to proposals of replacing human participants with current state-of-the-art LLMs in psychological research and highlights the need for further research about the distinctive aspects of machine psychology.
title Exploring the psychology of LLMs' Moral and Legal Reasoning
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2308.01264