Saved in:
Bibliographic Details
Main Authors: Shen, Yifan, Zhang, Jiawen, Xu, Jian, Kim, Junho, Lourentzou, Ismini, Cao, Xu, Huang, Meihuan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.17894
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • While agentic AI and its core multimodal large language models (MLLMs) have demonstrated remarkable promise in language and visual reasoning across domains ranging from daily life to advanced scientific research, a profound gap remains between artificial and human intelligence. Despite the integration of powerful tools and advanced MLLMs, state-of-the-art AI agents frequently fail at foundational, seemingly simple tasks that a child can resolve with ease. Inspired by the Wechsler Intelligence Scale for Children (WISC), we introduce ChildAgentEval, the first psychometrically grounded interactive benchmark for evaluating cognitive age alignment in MLLM-based agents. ChildAgentEval systematically compares the reasoning performance of various MLLM-based interactive agents against age-specific human developmental stages, exposing where current agentic AI systems can and cannot simulate age-specific cognitive behavior.