MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Schelb, Julian, Borin, Orr, Garcia, David, Spitz, Andreas
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2503.10229
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866917955648880640
author	Schelb, Julian Borin, Orr Garcia, David Spitz, Andreas
author_facet	Schelb, Julian Borin, Orr Garcia, David Spitz, Andreas
contents	Generative language models are increasingly being subjected to psychometric questionnaires intended for human testing, in efforts to establish their traits, as benchmarks for alignment, or to simulate participants in social science experiments. While this growing body of work sheds light on the likeness of model responses to those of humans, concerns are warranted regarding the rigour and reproducibility with which these experiments may be conducted. Instabilities in model outputs, sensitivity to prompt design, parameter settings, and a large number of available model versions increase documentation requirements. Consequently, generalization of findings is often complex and reproducibility is far from guaranteed. In this paper, we present R.U.Psycho, a framework for designing and running robust and reproducible psychometric experiments on generative language models that requires limited coding expertise. We demonstrate the capability of our framework on a variety of psychometric questionnaires, which lend support to prior findings in the literature. R.U.Psycho is available as a Python package at https://github.com/julianschelb/rupsycho.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_10229
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	R.U.Psycho? Robust Unified Psychometric Testing of Language Models Schelb, Julian Borin, Orr Garcia, David Spitz, Andreas Computation and Language Generative language models are increasingly being subjected to psychometric questionnaires intended for human testing, in efforts to establish their traits, as benchmarks for alignment, or to simulate participants in social science experiments. While this growing body of work sheds light on the likeness of model responses to those of humans, concerns are warranted regarding the rigour and reproducibility with which these experiments may be conducted. Instabilities in model outputs, sensitivity to prompt design, parameter settings, and a large number of available model versions increase documentation requirements. Consequently, generalization of findings is often complex and reproducibility is far from guaranteed. In this paper, we present R.U.Psycho, a framework for designing and running robust and reproducible psychometric experiments on generative language models that requires limited coding expertise. We demonstrate the capability of our framework on a variety of psychometric questionnaires, which lend support to prior findings in the literature. R.U.Psycho is available as a Python package at https://github.com/julianschelb/rupsycho.
title	R.U.Psycho? Robust Unified Psychometric Testing of Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2503.10229

Documenti analoghi