Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhao, Yuan, Zhu, Hualei, Jiang, Tingyu, Li, Shen, Xu, Xiaohang, Wang, Hao Henry
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Artificial Intelligence Computation and Language
Online-Zugang:	https://arxiv.org/abs/2511.10705
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866917079529029632
author	Zhao, Yuan Zhu, Hualei Jiang, Tingyu Li, Shen Xu, Xiaohang Wang, Hao Henry
author_facet	Zhao, Yuan Zhu, Hualei Jiang, Tingyu Li, Shen Xu, Xiaohang Wang, Hao Henry
contents	Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and grounding capabilities, current methodologies exhibit two fundamental limitations: (1) insufficient exploitation of cross-model synergies, and (2) over-reliance on synthetic data generation without sufficient utilization. To address these challenges, we propose Co-EPG, a self-iterative training framework for Co-Evolution of Planning and Grounding. Co-EPG establishes an iterative positive feedback loop: through this loop, the planning model explores superior strategies under grounding-based reward guidance via Group Relative Policy Optimization (GRPO), generating diverse data to optimize the grounding model. Concurrently, the optimized Grounding model provides more effective rewards for subsequent GRPO training of the planning model, fostering continuous improvement. Co-EPG thus enables iterative enhancement of agent capabilities through self-play optimization and training data distillation. On the Multimodal-Mind2Web and AndroidControl benchmarks, our framework outperforms existing state-of-the-art methods after just three iterations without requiring external data. The agent consistently improves with each iteration, demonstrating robust self-enhancement capabilities. This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_10705
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents Zhao, Yuan Zhu, Hualei Jiang, Tingyu Li, Shen Xu, Xiaohang Wang, Hao Henry Artificial Intelligence Computation and Language Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and grounding capabilities, current methodologies exhibit two fundamental limitations: (1) insufficient exploitation of cross-model synergies, and (2) over-reliance on synthetic data generation without sufficient utilization. To address these challenges, we propose Co-EPG, a self-iterative training framework for Co-Evolution of Planning and Grounding. Co-EPG establishes an iterative positive feedback loop: through this loop, the planning model explores superior strategies under grounding-based reward guidance via Group Relative Policy Optimization (GRPO), generating diverse data to optimize the grounding model. Concurrently, the optimized Grounding model provides more effective rewards for subsequent GRPO training of the planning model, fostering continuous improvement. Co-EPG thus enables iterative enhancement of agent capabilities through self-play optimization and training data distillation. On the Multimodal-Mind2Web and AndroidControl benchmarks, our framework outperforms existing state-of-the-art methods after just three iterations without requiring external data. The agent consistently improves with each iteration, demonstrating robust self-enhancement capabilities. This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.
title	Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2511.10705

Ähnliche Einträge