Saved in:
Bibliographic Details
Main Authors: Li, Yanlin, Liu, Hao, Liu, Huimin, Wang, Kun, Wei, Yinwei, Hu, Yupeng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.14161
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914259059867648
author Li, Yanlin
Liu, Hao
Liu, Huimin
Wang, Kun
Wei, Yinwei
Hu, Yupeng
author_facet Li, Yanlin
Liu, Hao
Liu, Huimin
Wang, Kun
Wei, Yinwei
Hu, Yupeng
contents Theory of Mind (ToM) in Large Language Models (LLMs) refers to the model's ability to infer the mental states of others, with failures in this ability often manifesting as systemic implicit biases. Assessing this challenge is difficult, as traditional direct inquiry methods are often met with refusal to answer and fail to capture its subtle and multidimensional nature. Therefore, we propose MIST, which reconceptualizes the content model of stereotypes into multidimensional failures of ToM, specifically in the domains of competence, sociability, and morality. The framework introduces two indirect tasks. The Word Association Bias Test (WABT) assesses implicit lexical associations, while the Affective Attribution Test (AAT) measures implicit emotional tendencies, aiming to uncover latent stereotypes without triggering model avoidance. Through extensive experimentation on eight state-of-the-art LLMs, our framework demonstrates the ability to reveal complex bias structures and improved robustness. All data and code will be released.
format Preprint
id arxiv_https___arxiv_org_abs_2506_14161
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MIST: Towards Multi-dimensional Implicit BiaS Evaluation of LLMs for Theory of Mind
Li, Yanlin
Liu, Hao
Liu, Huimin
Wang, Kun
Wei, Yinwei
Hu, Yupeng
Computation and Language
Theory of Mind (ToM) in Large Language Models (LLMs) refers to the model's ability to infer the mental states of others, with failures in this ability often manifesting as systemic implicit biases. Assessing this challenge is difficult, as traditional direct inquiry methods are often met with refusal to answer and fail to capture its subtle and multidimensional nature. Therefore, we propose MIST, which reconceptualizes the content model of stereotypes into multidimensional failures of ToM, specifically in the domains of competence, sociability, and morality. The framework introduces two indirect tasks. The Word Association Bias Test (WABT) assesses implicit lexical associations, while the Affective Attribution Test (AAT) measures implicit emotional tendencies, aiming to uncover latent stereotypes without triggering model avoidance. Through extensive experimentation on eight state-of-the-art LLMs, our framework demonstrates the ability to reveal complex bias structures and improved robustness. All data and code will be released.
title MIST: Towards Multi-dimensional Implicit BiaS Evaluation of LLMs for Theory of Mind
topic Computation and Language
url https://arxiv.org/abs/2506.14161