Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kang, Hao, Li, Ziyang, Yang, Xinyu, Xu, Weili, Chen, Yinfang, Wang, Junxiong, Chen, Beidi, Krishna, Tushar, Xu, Chenfeng, Arora, Simran
Format:	Preprint
Published:	2026
Subjects:	Operating Systems Multiagent Systems
Online Access:	https://arxiv.org/abs/2602.13692
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908877467942912
author	Kang, Hao Li, Ziyang Yang, Xinyu Xu, Weili Chen, Yinfang Wang, Junxiong Chen, Beidi Krishna, Tushar Xu, Chenfeng Arora, Simran
author_facet	Kang, Hao Li, Ziyang Yang, Xinyu Xu, Weili Chen, Yinfang Wang, Junxiong Chen, Beidi Krishna, Tushar Xu, Chenfeng Arora, Simran
contents	Large language models(LLMs) are now used to power complex multi-turn agentic workflows. Existing systems run agentic inference by loosely assembling isolated components: an LLM inference engine (e.g., vLLM) and a tool orchestrator (e.g., Kubernetes). Although agentic workflows involve multiple LLM and tool requests, these systems schedule and allocate resources separately on a per-request basis, without end-to-end knowledge of the workflow. This leads to sub-optimal management of KV cache and tool execution environments. To address the challenges, we propose ThunderAgent, a fast, simple, and program-aware agentic inference system. We first abstract agentic workflows as LLM Programs, enabling a unified view of heterogeneous resources, including KV caches, system states, and external tool assets such as disk memory and network ports. Built upon this abstraction, ThunderAgent introduces a program-aware scheduler and a tool resource manager designed to maximize KV cache hit rates, mitigate memory imbalances, and enable asynchronous environment preparation. Evaluations across coding, routing, and scientific discovery agents demonstrate that ThunderAgent achieves 1.5-3.6x throughput improvements in serving, 1.8-3.9x in RL rollout, and up to 4.2x disk memory savings compared to state-of-the-art inference systems. To facilitate reproducibility and support future development, we open-source the system implementations of the whole ThunderAgent at: https://github.com/Agentic-Kinetics/ThunderAgent.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13692
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System Kang, Hao Li, Ziyang Yang, Xinyu Xu, Weili Chen, Yinfang Wang, Junxiong Chen, Beidi Krishna, Tushar Xu, Chenfeng Arora, Simran Operating Systems Multiagent Systems Large language models(LLMs) are now used to power complex multi-turn agentic workflows. Existing systems run agentic inference by loosely assembling isolated components: an LLM inference engine (e.g., vLLM) and a tool orchestrator (e.g., Kubernetes). Although agentic workflows involve multiple LLM and tool requests, these systems schedule and allocate resources separately on a per-request basis, without end-to-end knowledge of the workflow. This leads to sub-optimal management of KV cache and tool execution environments. To address the challenges, we propose ThunderAgent, a fast, simple, and program-aware agentic inference system. We first abstract agentic workflows as LLM Programs, enabling a unified view of heterogeneous resources, including KV caches, system states, and external tool assets such as disk memory and network ports. Built upon this abstraction, ThunderAgent introduces a program-aware scheduler and a tool resource manager designed to maximize KV cache hit rates, mitigate memory imbalances, and enable asynchronous environment preparation. Evaluations across coding, routing, and scientific discovery agents demonstrate that ThunderAgent achieves 1.5-3.6x throughput improvements in serving, 1.8-3.9x in RL rollout, and up to 4.2x disk memory savings compared to state-of-the-art inference systems. To facilitate reproducibility and support future development, we open-source the system implementations of the whole ThunderAgent at: https://github.com/Agentic-Kinetics/ThunderAgent.
title	ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System
topic	Operating Systems Multiagent Systems
url	https://arxiv.org/abs/2602.13692

Similar Items