Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Format: | Recurso digital |
| Sprache: | |
| Veröffentlicht: |
Zenodo
2021
|
| Schlagworte: | |
| Online-Zugang: | https://doi.org/10.5281/zenodo.18902004 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Inhaltsangabe:
- <p>Modern enterprise systems increasingly rely on distributed, service-oriented, and cloud-native<br>architectures composed of multiple interacting layers including infrastructure, platforms,<br>applications, and user-facing services where failures often propagate across components in nonlinear and time-dependent ways. While these architectures deliver scalability, resilience, and<br>rapid innovation, they also introduce significant challenges for fault diagnosis and operational<br>troubleshooting due to high system cardinality, dynamic service dependencies, frequent<br>deployments, and heterogeneous telemetry sources such as logs, metrics, and traces. Traditional<br>root cause analysis (RCA) approaches, which depend heavily on manual inspection, static<br>topology assumptions, or rule-based heuristics, are increasingly inadequate in this environment,<br>as they struggle to distinguish true causal signals from correlated noise and cascading effects. This<br>article examines intelligent root cause analysis techniques for multi-layer enterprise systems,<br>focusing on three foundational pillars: distributed tracing to capture end-to-end execution context,<br>data-driven anomaly correlation to identify statistically significant failure indicators, and graphbased dependency modeling to represent and reason about system structure and failure<br>propagation. Drawing on established research and industry practices published between 2000 and<br>2021, we synthesize how these complementary techniques collectively enable faster, more<br>accurate, and increasingly automated root cause identification, reducing mean time to resolution<br>and improving operational reliability in complex enterprise environments.</p>