Saved in:
Bibliographic Details
Main Authors: Zheng, Zihan, Cui, Tianle, Wang, Taoran, Wang, Fengtao, Pan, Jiahui, He, Lewei, Chen, Qianglong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.01330
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918453662711808
author Zheng, Zihan
Cui, Tianle
Wang, Taoran
Wang, Fengtao
Pan, Jiahui
He, Lewei
Chen, Qianglong
author_facet Zheng, Zihan
Cui, Tianle
Wang, Taoran
Wang, Fengtao
Pan, Jiahui
He, Lewei
Chen, Qianglong
contents Despite significant advances in LLM-driven GUI agents, the field remains constrained by the challenge of reconciling high-fidelity realism with verifiable evaluation accuracy. To address this, we introduce NaturalGAIA, a verifiable evaluation dataset grounded in real-world human GUI interaction intents. By decoupling logical causal pathways from linguistic narratives, it rigorously simulates natural human intent, characterized by cognitive non-linearity and contextual dependencies. Furthermore, we propose LightManus-Jarvis, a hierarchical collaborative framework where LightManus manages dynamic topological planning and context evolution, while Jarvis~ensures execution precision via hybrid visual-structural perception. Experiments demonstrate that our approach achieves a Weighted Pathway Success Rate of 45.6%, significantly outperforming the state-of-the-art baseline (21.1%), while reducing token consumption by 75% and execution time by 76%. These results validate the efficacy of the macro-planning and micro-execution paradigm in handling complex naturalized tasks. Our code is publicly available at: https://github.com/KeLes-Coding/NatureGAIA.
format Preprint
id arxiv_https___arxiv_org_abs_2508_01330
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks
Zheng, Zihan
Cui, Tianle
Wang, Taoran
Wang, Fengtao
Pan, Jiahui
He, Lewei
Chen, Qianglong
Artificial Intelligence
Despite significant advances in LLM-driven GUI agents, the field remains constrained by the challenge of reconciling high-fidelity realism with verifiable evaluation accuracy. To address this, we introduce NaturalGAIA, a verifiable evaluation dataset grounded in real-world human GUI interaction intents. By decoupling logical causal pathways from linguistic narratives, it rigorously simulates natural human intent, characterized by cognitive non-linearity and contextual dependencies. Furthermore, we propose LightManus-Jarvis, a hierarchical collaborative framework where LightManus manages dynamic topological planning and context evolution, while Jarvis~ensures execution precision via hybrid visual-structural perception. Experiments demonstrate that our approach achieves a Weighted Pathway Success Rate of 45.6%, significantly outperforming the state-of-the-art baseline (21.1%), while reducing token consumption by 75% and execution time by 76%. These results validate the efficacy of the macro-planning and micro-execution paradigm in handling complex naturalized tasks. Our code is publicly available at: https://github.com/KeLes-Coding/NatureGAIA.
title NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks
topic Artificial Intelligence
url https://arxiv.org/abs/2508.01330