Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Boddupally, Hema Latha
Format: Recurso digital
Sprache:
Veröffentlicht: Zenodo 2021
Schlagworte:
Online-Zugang:https://doi.org/10.5281/zenodo.18902004
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Inhaltsangabe:
  • <p>Modern enterprise systems increasingly rely on distributed, service-oriented, and cloud-native<br>architectures composed of multiple interacting layers including infrastructure, platforms,<br>applications, and user-facing services where failures often propagate across components in nonlinear and time-dependent ways. While these architectures deliver scalability, resilience, and<br>rapid innovation, they also introduce significant challenges for fault diagnosis and operational<br>troubleshooting due to high system cardinality, dynamic service dependencies, frequent<br>deployments, and heterogeneous telemetry sources such as logs, metrics, and traces. Traditional<br>root cause analysis (RCA) approaches, which depend heavily on manual inspection, static<br>topology assumptions, or rule-based heuristics, are increasingly inadequate in this environment,<br>as they struggle to distinguish true causal signals from correlated noise and cascading effects. This<br>article examines intelligent root cause analysis techniques for multi-layer enterprise systems,<br>focusing on three foundational pillars: distributed tracing to capture end-to-end execution context,<br>data-driven anomaly correlation to identify statistically significant failure indicators, and graphbased dependency modeling to represent and reason about system structure and failure<br>propagation. Drawing on established research and industry practices published between 2000 and<br>2021, we synthesize how these complementary techniques collectively enable faster, more<br>accurate, and increasingly automated root cause identification, reducing mean time to resolution<br>and improving operational reliability in complex enterprise environments.</p>