Saved in:
Bibliographic Details
Main Authors: Lu, Yi-Long, Zhang, Chunhui, Song, Jiajun, Fan, Lifeng, Wang, Wei
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.01698
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912378569883648
author Lu, Yi-Long
Zhang, Chunhui
Song, Jiajun
Fan, Lifeng
Wang, Wei
author_facet Lu, Yi-Long
Zhang, Chunhui
Song, Jiajun
Fan, Lifeng
Wang, Wei
contents Theory of Mind (ToM), the ability to attribute mental states to others, is fundamental for human social intelligence and a critical capability for advanced Artificial Intelligence. Recent advancements in Large Language Models (LLMs) have shown promising performance on ToM benchmarks, raising the question: Do these benchmarks necessitate explicit human-like reasoning processes, or can models succeed through alternative strategies? We investigate this question empirically by applying Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) to LLMs of varying scales (0.5B to 7B parameters) and evaluating them across multiple ToM datasets. Our results reveal a scale-dependent impact of RL: while RL significantly improves accuracy and fosters high-quality, interpretable, and transferable belief-tracking reasoning in larger models (7B), it leads to "reasoning collapse" in smaller models ($\leq$3B), where high accuracy and generalization ability are achieved via drastically shortened, less meaningful responses. Surprisingly, further SFT achieves competitive and generalizable performance across these benchmarks, often matching or exceeding RL models in accuracy, despite not being explicitly trained to produce structured reasoning traces. These findings highlight a critical discrepancy between benchmark accuracy and the nature of learned reasoning. Our work suggests that current ToM benchmarks may be solvable without requiring the explicit, human-like simulation of mental states they were designed to probe. LLMs, particularly when scale is limited or training signals focus solely on output correctness, may leverage alternative rules effective for benchmark data structures.
format Preprint
id arxiv_https___arxiv_org_abs_2504_01698
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
Lu, Yi-Long
Zhang, Chunhui
Song, Jiajun
Fan, Lifeng
Wang, Wei
Computation and Language
Artificial Intelligence
Theory of Mind (ToM), the ability to attribute mental states to others, is fundamental for human social intelligence and a critical capability for advanced Artificial Intelligence. Recent advancements in Large Language Models (LLMs) have shown promising performance on ToM benchmarks, raising the question: Do these benchmarks necessitate explicit human-like reasoning processes, or can models succeed through alternative strategies? We investigate this question empirically by applying Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) to LLMs of varying scales (0.5B to 7B parameters) and evaluating them across multiple ToM datasets. Our results reveal a scale-dependent impact of RL: while RL significantly improves accuracy and fosters high-quality, interpretable, and transferable belief-tracking reasoning in larger models (7B), it leads to "reasoning collapse" in smaller models ($\leq$3B), where high accuracy and generalization ability are achieved via drastically shortened, less meaningful responses. Surprisingly, further SFT achieves competitive and generalizable performance across these benchmarks, often matching or exceeding RL models in accuracy, despite not being explicitly trained to produce structured reasoning traces. These findings highlight a critical discrepancy between benchmark accuracy and the nature of learned reasoning. Our work suggests that current ToM benchmarks may be solvable without requiring the explicit, human-like simulation of mental states they were designed to probe. LLMs, particularly when scale is limited or training signals focus solely on output correctness, may leverage alternative rules effective for benchmark data structures.
title Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2504.01698