Inhaltsangabe: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Boddupally, Hema Latha
Format:	Recurso digital
Sprache:
Veröffentlicht:	Zenodo 2021
Schlagworte:	Root Cause Analysis, Distributed Systems, Enterprise Systems, Observability, AIOps, Distributed Tracing, Dependency Graphs, Log Analytics, Anomaly Detection, Knowledge Graphs
Online-Zugang:	https://doi.org/10.5281/zenodo.18902004
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Inhaltsangabe:

Modern enterprise systems increasingly rely on distributed, service-oriented, and cloud-native architectures composed of multiple interacting layers including infrastructure, platforms, applications, and user-facing services where failures often propagate across components in nonlinear and time-dependent ways. While these architectures deliver scalability, resilience, and rapid innovation, they also introduce significant challenges for fault diagnosis and operational troubleshooting due to high system cardinality, dynamic service dependencies, frequent deployments, and heterogeneous telemetry sources such as logs, metrics, and traces. Traditional root cause analysis (RCA) approaches, which depend heavily on manual inspection, static topology assumptions, or rule-based heuristics, are increasingly inadequate in this environment, as they struggle to distinguish true causal signals from correlated noise and cascading effects. This article examines intelligent root cause analysis techniques for multi-layer enterprise systems, focusing on three foundational pillars: distributed tracing to capture end-to-end execution context, data-driven anomaly correlation to identify statistically significant failure indicators, and graphbased dependency modeling to represent and reason about system structure and failure propagation. Drawing on established research and industry practices published between 2000 and 2021, we synthesize how these complementary techniques collectively enable faster, more accurate, and increasingly automated root cause identification, reducing mean time to resolution and improving operational reliability in complex enterprise environments.

Ähnliche Einträge