Guardado en:
Detalles Bibliográficos
Autores principales: Gaikwad, Madhava, Gandhi, Abhishek
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2508.16119
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Tabla de Contenidos:
  • We present ANSC, a probabilistic capacity health scoring framework for hyperscale datacenter fabrics. While existing alerting systems detect individual device or link failures, they do not capture the aggregate risk of cascading capacity shortfalls. ANSC provides a color-coded scoring system that indicates the urgency of issues \emph{not solely by current impact, but by the probability of imminent capacity violations}. Our system accounts for both current residual capacity and the probability of additional failures, normalized at datacenter and regional level. We demonstrate that ANSC enables operators to prioritize remediation across more than 400 datacenters and 60 regions, reducing noise and aligning SRE focus on the most critical risks.