Saved in:
Bibliographic Details
Main Authors: Reiter, Hendrik, Hamid, Ahmad Rzgar, Schlösser, Florian, Kjærgaard, Mikkel Baun, Hasselbring, Wilhelm
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.22702
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913966317371392
author Reiter, Hendrik
Hamid, Ahmad Rzgar
Schlösser, Florian
Kjærgaard, Mikkel Baun
Hasselbring, Wilhelm
author_facet Reiter, Hendrik
Hamid, Ahmad Rzgar
Schlösser, Florian
Kjærgaard, Mikkel Baun
Hasselbring, Wilhelm
contents Edge computing offers significant advantages for realtime data processing tasks, such as object recognition, by reducing network latency and bandwidth usage. However, edge environments are susceptible to various types of fault. A remediator is an automated software component designed to adjust the configuration parameters of a software service dynamically. Its primary function is to maintain the services operational state within predefined Service Level Objectives by applying corrective actions in response to deviations from these objectives. Remediators can be implemented based on the Kubernetes container orchestration tool by implementing remediation strategies such as rescheduling or adjusting application parameters. However, currently, there is no method to compare these remediation strategies fairly. This paper introduces Ecoscape, a comprehensive benchmark designed to evaluate the performance of remediation strategies in fault-prone environments. Using Chaos Engineering techniques, Ecoscape simulates realistic fault scenarios and provides a quantifiable score to assess the efficacy of different remediation approaches. In addition, it is configurable to support domain-specific Service Level Objectives. We demonstrate the capabilities of Ecoscape in edge machine learning inference, offering a clear framework to optimize fault tolerance in these systems without needing a physical edge testbed.
format Preprint
id arxiv_https___arxiv_org_abs_2507_22702
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Ecoscape: Fault Tolerance Benchmark for Adaptive Remediation Strategies in Real-Time Edge ML
Reiter, Hendrik
Hamid, Ahmad Rzgar
Schlösser, Florian
Kjærgaard, Mikkel Baun
Hasselbring, Wilhelm
Performance
Edge computing offers significant advantages for realtime data processing tasks, such as object recognition, by reducing network latency and bandwidth usage. However, edge environments are susceptible to various types of fault. A remediator is an automated software component designed to adjust the configuration parameters of a software service dynamically. Its primary function is to maintain the services operational state within predefined Service Level Objectives by applying corrective actions in response to deviations from these objectives. Remediators can be implemented based on the Kubernetes container orchestration tool by implementing remediation strategies such as rescheduling or adjusting application parameters. However, currently, there is no method to compare these remediation strategies fairly. This paper introduces Ecoscape, a comprehensive benchmark designed to evaluate the performance of remediation strategies in fault-prone environments. Using Chaos Engineering techniques, Ecoscape simulates realistic fault scenarios and provides a quantifiable score to assess the efficacy of different remediation approaches. In addition, it is configurable to support domain-specific Service Level Objectives. We demonstrate the capabilities of Ecoscape in edge machine learning inference, offering a clear framework to optimize fault tolerance in these systems without needing a physical edge testbed.
title Ecoscape: Fault Tolerance Benchmark for Adaptive Remediation Strategies in Real-Time Edge ML
topic Performance
url https://arxiv.org/abs/2507.22702