MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Batzner, Jan, Stocker, Volker, Tang, Bingjun, Natarajan, Anusha, Chen, Qinhao, Schmid, Stefan, Kasneci, Gjergji
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computers and Society Computation and Language
Accesso online:	https://arxiv.org/abs/2512.00461
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866909934070792192
author	Batzner, Jan Stocker, Volker Tang, Bingjun Natarajan, Anusha Chen, Qinhao Schmid, Stefan Kasneci, Gjergji
author_facet	Batzner, Jan Stocker, Volker Tang, Bingjun Natarajan, Anusha Chen, Qinhao Schmid, Stefan Kasneci, Gjergji
contents	Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_00461
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency Batzner, Jan Stocker, Volker Tang, Bingjun Natarajan, Anusha Chen, Qinhao Schmid, Stefan Kasneci, Gjergji Computers and Society Computation and Language Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.
title	Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency
topic	Computers and Society Computation and Language
url	https://arxiv.org/abs/2512.00461

Documenti analoghi