Saved in:
Bibliographic Details
Main Author: Or, Barak
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.20663
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915696030515200
author Or, Barak
author_facet Or, Barak
contents Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
format Preprint
id arxiv_https___arxiv_org_abs_2511_20663
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Or, Barak
Multiagent Systems
Artificial Intelligence
Systems and Control
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
title MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
topic Multiagent Systems
Artificial Intelligence
Systems and Control
url https://arxiv.org/abs/2511.20663