Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Junlin, Zhang, Dylan, Song, Xiangchen, Dai, Qirun, Liu, Xiao, Chen, Yuen, Vashishtha, Aniket, Shi, Jing, Tan, Chenhao, Peng, Hao
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2605.26029
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913169435262976
author	Yang, Junlin Zhang, Dylan Song, Xiangchen Dai, Qirun Liu, Xiao Chen, Yuen Vashishtha, Aniket Shi, Jing Tan, Chenhao Peng, Hao
author_facet	Yang, Junlin Zhang, Dylan Song, Xiangchen Dai, Qirun Liu, Xiao Chen, Yuen Vashishtha, Aniket Shi, Jing Tan, Chenhao Peng, Hao
contents	We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is grounded in a faithful recovered causal mechanism. Each episode places an agent in a synthetic laboratory: it receives prior measurement records, intervenes on a manipulator crystal, and predicts the resonance frequency of a held-out reactor crystal governed by the same mechanism. The hidden data-generating process is a randomly sampled structural causal model (SCM), so success requires recovering both a causal graph and structural equations rather than recalling prior knowledge. Experiments show a persistent gap between prediction and mechanism recovery: in the purely observational 6-node setting, GPT-5.2-high reaches 92% task accuracy but only 0.471 all-edge $F_1$. Mixed observation-intervention strategies improve structural fidelity, while pure intervention remains difficult even for strong agents. We identify premature stopping as a major weakness and show that consistency verification mitigates it. CausaLab therefore separates predictive success from causal understanding and exposes current LLM agents' limits as experimental causal reasoners.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_26029
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Yang, Junlin Zhang, Dylan Song, Xiangchen Dai, Qirun Liu, Xiao Chen, Yuen Vashishtha, Aniket Shi, Jing Tan, Chenhao Peng, Hao Artificial Intelligence Computation and Language We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is grounded in a faithful recovered causal mechanism. Each episode places an agent in a synthetic laboratory: it receives prior measurement records, intervenes on a manipulator crystal, and predicts the resonance frequency of a held-out reactor crystal governed by the same mechanism. The hidden data-generating process is a randomly sampled structural causal model (SCM), so success requires recovering both a causal graph and structural equations rather than recalling prior knowledge. Experiments show a persistent gap between prediction and mechanism recovery: in the purely observational 6-node setting, GPT-5.2-high reaches 92% task accuracy but only 0.471 all-edge $F_1$. Mixed observation-intervention strategies improve structural fidelity, while pure intervention remains difficult even for strong agents. We identify premature stopping as a major weakness and show that consistency verification mitigates it. CausaLab therefore separates predictive success from causal understanding and exposes current LLM agents' limits as experimental causal reasoners.
title	CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2605.26029

Similar Items