Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Twabi, Ahmed, Ding, Yepeng, Kondo, Tohru
Format:	Preprint
Published:	2026
Subjects:	Networking and Internet Architecture Artificial Intelligence Formal Languages and Automata Theory
Online Access:	https://arxiv.org/abs/2604.09678
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915931416952832
author	Twabi, Ahmed Ding, Yepeng Kondo, Tohru
author_facet	Twabi, Ahmed Ding, Yepeng Kondo, Tohru
contents	As agentic network management gains popularity, there is a critical need for evaluation frameworks that transcend static, one-shot testing. To address this, we introduce NetAgentBench, a dynamic benchmark that evaluates agent interactions through a Finite State Machine (FSM) formalization guaranteeing determinism, correctness, and bounded execution. This provides the networking landscape with a rigorous foundation to measure complex, multi-turn operational behaviors. Our empirical evaluation of four state-of-the-art LLM agents through diverse network configuration tasks reveals stark deficiencies: while agents can solve basic tasks, they suffer severe exploration meltdowns and coherence collapse during expert-level configurations. Ultimately, NetAgentBench demonstrates that systematically evaluating multi-turn behavioral stability is an indispensable step toward realizing trustworthy, fully autonomous networks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_09678
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	NetAgentBench: A State-Centric Benchmark for Evaluating Agentic Network Configuration Twabi, Ahmed Ding, Yepeng Kondo, Tohru Networking and Internet Architecture Artificial Intelligence Formal Languages and Automata Theory As agentic network management gains popularity, there is a critical need for evaluation frameworks that transcend static, one-shot testing. To address this, we introduce NetAgentBench, a dynamic benchmark that evaluates agent interactions through a Finite State Machine (FSM) formalization guaranteeing determinism, correctness, and bounded execution. This provides the networking landscape with a rigorous foundation to measure complex, multi-turn operational behaviors. Our empirical evaluation of four state-of-the-art LLM agents through diverse network configuration tasks reveals stark deficiencies: while agents can solve basic tasks, they suffer severe exploration meltdowns and coherence collapse during expert-level configurations. Ultimately, NetAgentBench demonstrates that systematically evaluating multi-turn behavioral stability is an indispensable step toward realizing trustworthy, fully autonomous networks.
title	NetAgentBench: A State-Centric Benchmark for Evaluating Agentic Network Configuration
topic	Networking and Internet Architecture Artificial Intelligence Formal Languages and Automata Theory
url	https://arxiv.org/abs/2604.09678

Similar Items