Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Ding, Tinghe
Format:	Preprint
Published:	2024
Subjects:	Human-Computer Interaction Artificial Intelligence
Online Access:	https://arxiv.org/abs/2401.04124
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913198530101248
author	Ding, Tinghe
author_facet	Ding, Tinghe
contents	Agents centered around Large Language Models (LLMs) are now capable of automating mobile device operations for users. After fine-tuning to learn a user's mobile operations, these agents can adhere to high-level user instructions online. They execute tasks such as goal decomposition, sequencing of sub-goals, and interactive environmental exploration, until the final objective is achieved. However, privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation. Moreover, users' real-world operations are exploratory, with action data being complex and redundant, posing challenges for agent learning. To address these issues, in our practical application, we have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs. Additionally, we integrated Standard Operating Procedure (SOP) information within the model's in-context learning to enhance the agent's comprehension of complex task execution. Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks, including application operation, web searching, and web shopping. Experimental results show that the SOP-based agent achieves state-of-the-art performance in LLMs without incurring additional inference costs, boasting an overall action success rate of 66.92\%. The code and data examples are available at https://github.com/alipay/mobile-agent.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_04124
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	MobileAgent: enhancing mobile control via human-machine interaction and SOP integration Ding, Tinghe Human-Computer Interaction Artificial Intelligence Agents centered around Large Language Models (LLMs) are now capable of automating mobile device operations for users. After fine-tuning to learn a user's mobile operations, these agents can adhere to high-level user instructions online. They execute tasks such as goal decomposition, sequencing of sub-goals, and interactive environmental exploration, until the final objective is achieved. However, privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation. Moreover, users' real-world operations are exploratory, with action data being complex and redundant, posing challenges for agent learning. To address these issues, in our practical application, we have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs. Additionally, we integrated Standard Operating Procedure (SOP) information within the model's in-context learning to enhance the agent's comprehension of complex task execution. Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks, including application operation, web searching, and web shopping. Experimental results show that the SOP-based agent achieves state-of-the-art performance in LLMs without incurring additional inference costs, boasting an overall action success rate of 66.92\%. The code and data examples are available at https://github.com/alipay/mobile-agent.
title	MobileAgent: enhancing mobile control via human-machine interaction and SOP integration
topic	Human-Computer Interaction Artificial Intelligence
url	https://arxiv.org/abs/2401.04124

Similar Items