Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yanlin, Liu, Hao, Liu, Huimin, Wang, Kun, Wei, Yinwei, Hu, Yupeng
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.14161
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914259059867648
author	Li, Yanlin Liu, Hao Liu, Huimin Wang, Kun Wei, Yinwei Hu, Yupeng
author_facet	Li, Yanlin Liu, Hao Liu, Huimin Wang, Kun Wei, Yinwei Hu, Yupeng
contents	Theory of Mind (ToM) in Large Language Models (LLMs) refers to the model's ability to infer the mental states of others, with failures in this ability often manifesting as systemic implicit biases. Assessing this challenge is difficult, as traditional direct inquiry methods are often met with refusal to answer and fail to capture its subtle and multidimensional nature. Therefore, we propose MIST, which reconceptualizes the content model of stereotypes into multidimensional failures of ToM, specifically in the domains of competence, sociability, and morality. The framework introduces two indirect tasks. The Word Association Bias Test (WABT) assesses implicit lexical associations, while the Affective Attribution Test (AAT) measures implicit emotional tendencies, aiming to uncover latent stereotypes without triggering model avoidance. Through extensive experimentation on eight state-of-the-art LLMs, our framework demonstrates the ability to reveal complex bias structures and improved robustness. All data and code will be released.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_14161
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MIST: Towards Multi-dimensional Implicit BiaS Evaluation of LLMs for Theory of Mind Li, Yanlin Liu, Hao Liu, Huimin Wang, Kun Wei, Yinwei Hu, Yupeng Computation and Language Theory of Mind (ToM) in Large Language Models (LLMs) refers to the model's ability to infer the mental states of others, with failures in this ability often manifesting as systemic implicit biases. Assessing this challenge is difficult, as traditional direct inquiry methods are often met with refusal to answer and fail to capture its subtle and multidimensional nature. Therefore, we propose MIST, which reconceptualizes the content model of stereotypes into multidimensional failures of ToM, specifically in the domains of competence, sociability, and morality. The framework introduces two indirect tasks. The Word Association Bias Test (WABT) assesses implicit lexical associations, while the Affective Attribution Test (AAT) measures implicit emotional tendencies, aiming to uncover latent stereotypes without triggering model avoidance. Through extensive experimentation on eight state-of-the-art LLMs, our framework demonstrates the ability to reveal complex bias structures and improved robustness. All data and code will be released.
title	MIST: Towards Multi-dimensional Implicit BiaS Evaluation of LLMs for Theory of Mind
topic	Computation and Language
url	https://arxiv.org/abs/2506.14161

Similar Items