Saved in:
Bibliographic Details
Main Authors: Wang, Yuan, Li, Mingyu, Chen, Haibo
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.04607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915890253004800
author Wang, Yuan
Li, Mingyu
Chen, Haibo
author_facet Wang, Yuan
Li, Mingyu
Chen, Haibo
contents Computer-use agents (CUAs) powered by large language models (LLMs) have emerged as a promising approach to automating computer tasks, yet they struggle with the existing human-oriented OS interfaces - graphical user interfaces (GUIs). GUIs force LLMs to decompose high-level goals into lengthy, error-prone sequences of fine-grained actions, resulting in low success rates and an excessive number of LLM calls. We propose Declarative Model Interface (DMI), an abstraction that transforms existing GUIs into three declarative primitives: access, state, and observation, thereby providing novel OS interfaces tailored for LLM agents. Our key idea is policy-mechanism separation: LLMs focus on high-level semantic planning (policy) while DMI handles low-level navigation and interaction (mechanism). DMI does not require modifying the application source code or relying on application programming interfaces (APIs). We evaluate DMI with Microsoft Office Suite (Word, PowerPoint, Excel) on Windows. Integrating DMI into a leading GUI-based agent baseline improves task success rates by 67% and reduces interaction steps by 43.5%. Notably, DMI completes over 61% of successful tasks with a single LLM call.
format Preprint
id arxiv_https___arxiv_org_abs_2510_04607
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents
Wang, Yuan
Li, Mingyu
Chen, Haibo
Operating Systems
Artificial Intelligence
Machine Learning
Computer-use agents (CUAs) powered by large language models (LLMs) have emerged as a promising approach to automating computer tasks, yet they struggle with the existing human-oriented OS interfaces - graphical user interfaces (GUIs). GUIs force LLMs to decompose high-level goals into lengthy, error-prone sequences of fine-grained actions, resulting in low success rates and an excessive number of LLM calls. We propose Declarative Model Interface (DMI), an abstraction that transforms existing GUIs into three declarative primitives: access, state, and observation, thereby providing novel OS interfaces tailored for LLM agents. Our key idea is policy-mechanism separation: LLMs focus on high-level semantic planning (policy) while DMI handles low-level navigation and interaction (mechanism). DMI does not require modifying the application source code or relying on application programming interfaces (APIs). We evaluate DMI with Microsoft Office Suite (Word, PowerPoint, Excel) on Windows. Integrating DMI into a leading GUI-based agent baseline improves task success rates by 67% and reduces interaction steps by 43.5%. Notably, DMI completes over 61% of successful tasks with a single LLM call.
title From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents
topic Operating Systems
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2510.04607