Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Yuan, Li, Mingyu, Chen, Haibo
Format:	Preprint
Published:	2025
Subjects:	Operating Systems Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2510.04607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915890253004800
author	Wang, Yuan Li, Mingyu Chen, Haibo
author_facet	Wang, Yuan Li, Mingyu Chen, Haibo
contents	Computer-use agents (CUAs) powered by large language models (LLMs) have emerged as a promising approach to automating computer tasks, yet they struggle with the existing human-oriented OS interfaces - graphical user interfaces (GUIs). GUIs force LLMs to decompose high-level goals into lengthy, error-prone sequences of fine-grained actions, resulting in low success rates and an excessive number of LLM calls. We propose Declarative Model Interface (DMI), an abstraction that transforms existing GUIs into three declarative primitives: access, state, and observation, thereby providing novel OS interfaces tailored for LLM agents. Our key idea is policy-mechanism separation: LLMs focus on high-level semantic planning (policy) while DMI handles low-level navigation and interaction (mechanism). DMI does not require modifying the application source code or relying on application programming interfaces (APIs). We evaluate DMI with Microsoft Office Suite (Word, PowerPoint, Excel) on Windows. Integrating DMI into a leading GUI-based agent baseline improves task success rates by 67% and reduces interaction steps by 43.5%. Notably, DMI completes over 61% of successful tasks with a single LLM call.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_04607
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents Wang, Yuan Li, Mingyu Chen, Haibo Operating Systems Artificial Intelligence Machine Learning Computer-use agents (CUAs) powered by large language models (LLMs) have emerged as a promising approach to automating computer tasks, yet they struggle with the existing human-oriented OS interfaces - graphical user interfaces (GUIs). GUIs force LLMs to decompose high-level goals into lengthy, error-prone sequences of fine-grained actions, resulting in low success rates and an excessive number of LLM calls. We propose Declarative Model Interface (DMI), an abstraction that transforms existing GUIs into three declarative primitives: access, state, and observation, thereby providing novel OS interfaces tailored for LLM agents. Our key idea is policy-mechanism separation: LLMs focus on high-level semantic planning (policy) while DMI handles low-level navigation and interaction (mechanism). DMI does not require modifying the application source code or relying on application programming interfaces (APIs). We evaluate DMI with Microsoft Office Suite (Word, PowerPoint, Excel) on Windows. Integrating DMI into a leading GUI-based agent baseline improves task success rates by 67% and reduces interaction steps by 43.5%. Notably, DMI completes over 61% of successful tasks with a single LLM call.
title	From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents
topic	Operating Systems Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2510.04607

Similar Items