Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zheng, Zihan, Cui, Tianle, Wang, Taoran, Wang, Fengtao, Pan, Jiahui, He, Lewei, Chen, Qianglong
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.01330
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918453662711808
author	Zheng, Zihan Cui, Tianle Wang, Taoran Wang, Fengtao Pan, Jiahui He, Lewei Chen, Qianglong
author_facet	Zheng, Zihan Cui, Tianle Wang, Taoran Wang, Fengtao Pan, Jiahui He, Lewei Chen, Qianglong
contents	Despite significant advances in LLM-driven GUI agents, the field remains constrained by the challenge of reconciling high-fidelity realism with verifiable evaluation accuracy. To address this, we introduce NaturalGAIA, a verifiable evaluation dataset grounded in real-world human GUI interaction intents. By decoupling logical causal pathways from linguistic narratives, it rigorously simulates natural human intent, characterized by cognitive non-linearity and contextual dependencies. Furthermore, we propose LightManus-Jarvis, a hierarchical collaborative framework where LightManus manages dynamic topological planning and context evolution, while Jarvis~ensures execution precision via hybrid visual-structural perception. Experiments demonstrate that our approach achieves a Weighted Pathway Success Rate of 45.6%, significantly outperforming the state-of-the-art baseline (21.1%), while reducing token consumption by 75% and execution time by 76%. These results validate the efficacy of the macro-planning and micro-execution paradigm in handling complex naturalized tasks. Our code is publicly available at: https://github.com/KeLes-Coding/NatureGAIA.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_01330
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks Zheng, Zihan Cui, Tianle Wang, Taoran Wang, Fengtao Pan, Jiahui He, Lewei Chen, Qianglong Artificial Intelligence Despite significant advances in LLM-driven GUI agents, the field remains constrained by the challenge of reconciling high-fidelity realism with verifiable evaluation accuracy. To address this, we introduce NaturalGAIA, a verifiable evaluation dataset grounded in real-world human GUI interaction intents. By decoupling logical causal pathways from linguistic narratives, it rigorously simulates natural human intent, characterized by cognitive non-linearity and contextual dependencies. Furthermore, we propose LightManus-Jarvis, a hierarchical collaborative framework where LightManus manages dynamic topological planning and context evolution, while Jarvis~ensures execution precision via hybrid visual-structural perception. Experiments demonstrate that our approach achieves a Weighted Pathway Success Rate of 45.6%, significantly outperforming the state-of-the-art baseline (21.1%), while reducing token consumption by 75% and execution time by 76%. These results validate the efficacy of the macro-planning and micro-execution paradigm in handling complex naturalized tasks. Our code is publicly available at: https://github.com/KeLes-Coding/NatureGAIA.
title	NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks
topic	Artificial Intelligence
url	https://arxiv.org/abs/2508.01330

Similar Items