Saved in:
| Main Authors: | Liu, Hao, Yang, Siyuan, Hu, Qinglei, Li, Dongyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.24573 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets
by: Liu, Hao, et al.
Published: (2025)
by: Liu, Hao, et al.
Published: (2025)
CodeMind: Evaluating Large Language Models for Code Reasoning
by: Liu, Changshu, et al.
Published: (2024)
by: Liu, Changshu, et al.
Published: (2024)
Reasoning as State Transition: A Representational Analysis of Reasoning Evolution in Large Language Models
by: Zhang, Siyuan, et al.
Published: (2026)
by: Zhang, Siyuan, et al.
Published: (2026)
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
by: Hu, Xiangkun, et al.
Published: (2024)
by: Hu, Xiangkun, et al.
Published: (2024)
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
by: Kim, Hyunwoo, et al.
Published: (2025)
by: Kim, Hyunwoo, et al.
Published: (2025)
ToMBench: Benchmarking Theory of Mind in Large Language Models
by: Chen, Zhuang, et al.
Published: (2024)
by: Chen, Zhuang, et al.
Published: (2024)
Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models
by: Lin, Zizheng, et al.
Published: (2024)
by: Lin, Zizheng, et al.
Published: (2024)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)
by: Stogiannidis, Ilias, et al.
Published: (2025)
High-Fidelity Pruning for Large Language Models
by: Zhu, Yijun, et al.
Published: (2026)
by: Zhu, Yijun, et al.
Published: (2026)
Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models
by: Yeo, Gerard Christopher, et al.
Published: (2025)
by: Yeo, Gerard Christopher, et al.
Published: (2025)
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
by: Xu, Hainiu, et al.
Published: (2024)
by: Xu, Hainiu, et al.
Published: (2024)
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors
by: Syed, Usman, et al.
Published: (2024)
by: Syed, Usman, et al.
Published: (2024)
Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?
by: Shen, Hao, et al.
Published: (2024)
by: Shen, Hao, et al.
Published: (2024)
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
by: Wu, Wenshan, et al.
Published: (2024)
by: Wu, Wenshan, et al.
Published: (2024)
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
by: Xue, Boyang, et al.
Published: (2025)
by: Xue, Boyang, et al.
Published: (2025)
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models
by: Cai, Yuanqing, et al.
Published: (2026)
by: Cai, Yuanqing, et al.
Published: (2026)
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
by: Lin, Zhiyu, et al.
Published: (2025)
by: Lin, Zhiyu, et al.
Published: (2025)
PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models
by: Tan, Fiona Anting, et al.
Published: (2024)
by: Tan, Fiona Anting, et al.
Published: (2024)
Evaluating Large Language Models for Financial Reasoning: A CFA-Based Benchmark Study
by: Yao, Xuan, et al.
Published: (2025)
by: Yao, Xuan, et al.
Published: (2025)
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
by: Gui, Jiayi, et al.
Published: (2024)
by: Gui, Jiayi, et al.
Published: (2024)
Probing Causality Manipulation of Large Language Models
by: Zhang, Chenyang, et al.
Published: (2024)
by: Zhang, Chenyang, et al.
Published: (2024)
Simulated Annealing Enhances Theory-of-Mind Reasoning in Autoregressive Language Models
by: Hu, Xucong, et al.
Published: (2026)
by: Hu, Xucong, et al.
Published: (2026)
AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy
by: Pan, Rui, et al.
Published: (2024)
by: Pan, Rui, et al.
Published: (2024)
SpeechR: A Benchmark for Speech Reasoning in Large Audio-Language Models
by: Yang, Wanqi, et al.
Published: (2025)
by: Yang, Wanqi, et al.
Published: (2025)
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
by: Zhao, Xueliang, et al.
Published: (2024)
by: Zhao, Xueliang, et al.
Published: (2024)
SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation
by: Liu, Hao, et al.
Published: (2026)
by: Liu, Hao, et al.
Published: (2026)
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
by: Lu, Yi-Long, et al.
Published: (2025)
by: Lu, Yi-Long, et al.
Published: (2025)
EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models
by: Huang, Junquan, et al.
Published: (2025)
by: Huang, Junquan, et al.
Published: (2025)
GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning
by: Tung, Luu Quy, et al.
Published: (2025)
by: Tung, Luu Quy, et al.
Published: (2025)
A Critical Review of Causal Reasoning Benchmarks for Large Language Models
by: Yang, Linying, et al.
Published: (2024)
by: Yang, Linying, et al.
Published: (2024)
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
by: Wei, Yanbin, et al.
Published: (2026)
by: Wei, Yanbin, et al.
Published: (2026)
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
by: Xu, Fengli, et al.
Published: (2025)
by: Xu, Fengli, et al.
Published: (2025)
TRAM: Benchmarking Temporal Reasoning for Large Language Models
by: Wang, Yuqing, et al.
Published: (2023)
by: Wang, Yuqing, et al.
Published: (2023)
BaZi-Based Character Simulation Benchmark: Evaluating AI on Temporal and Persona Reasoning
by: Zheng, Siyuan, et al.
Published: (2025)
by: Zheng, Siyuan, et al.
Published: (2025)
Mind the Motions: Benchmarking Theory-of-Mind in Everyday Body Language
by: Lee, Seungbeen, et al.
Published: (2025)
by: Lee, Seungbeen, et al.
Published: (2025)
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
by: Zhu, Yakun, et al.
Published: (2025)
by: Zhu, Yakun, et al.
Published: (2025)
TReB: A Comprehensive Benchmark for Evaluating Table Reasoning Capabilities of Large Language Models
by: Li, Ce, et al.
Published: (2025)
by: Li, Ce, et al.
Published: (2025)
Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models
by: Nickel, Christian, et al.
Published: (2026)
by: Nickel, Christian, et al.
Published: (2026)
Similar Items
-
SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets
by: Liu, Hao, et al.
Published: (2025) -
CodeMind: Evaluating Large Language Models for Code Reasoning
by: Liu, Changshu, et al.
Published: (2024) -
Reasoning as State Transition: A Representational Analysis of Reasoning Evolution in Large Language Models
by: Zhang, Siyuan, et al.
Published: (2026) -
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
by: Hu, Xiangkun, et al.
Published: (2024) -
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
by: Kim, Hyunwoo, et al.
Published: (2025)