Saved in:
Bibliographic Details
Main Author: Hu, Yiming
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.16896
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917048861327360
author Hu, Yiming
author_facet Hu, Yiming
contents Two-Phase TMR conserves energy by partitioning redundancy operations into two stages and making the execution of the third task copy optional, yet it remains susceptible to permanent faults. Reactive-TMR (R-TMR) counters this by isolating faulty cores, handling both transient and permanent faults. However, the lightweight hardware required by R-TMR not only increases complexity but also becomes a single point of failure itself. To bypass isolated node constraints, this paper proposes a Fault Tolerance and Isolation TMR (FTI-TMR) algorithm for interconnected multicore systems. By constructing a stability metric to identify the most reliable nodes in the system, which then perform periodic diagnostics to isolate permanent faults. Experimental results show that FTI-TMR reduces task workload by approximately 30% compared with baseline TMR while achieving higher permanent fault coverage.
format Preprint
id arxiv_https___arxiv_org_abs_2510_16896
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems
Hu, Yiming
Distributed, Parallel, and Cluster Computing
Two-Phase TMR conserves energy by partitioning redundancy operations into two stages and making the execution of the third task copy optional, yet it remains susceptible to permanent faults. Reactive-TMR (R-TMR) counters this by isolating faulty cores, handling both transient and permanent faults. However, the lightweight hardware required by R-TMR not only increases complexity but also becomes a single point of failure itself. To bypass isolated node constraints, this paper proposes a Fault Tolerance and Isolation TMR (FTI-TMR) algorithm for interconnected multicore systems. By constructing a stability metric to identify the most reliable nodes in the system, which then perform periodic diagnostics to isolate permanent faults. Experimental results show that FTI-TMR reduces task workload by approximately 30% compared with baseline TMR while achieving higher permanent fault coverage.
title FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems
topic Distributed, Parallel, and Cluster Computing
url https://arxiv.org/abs/2510.16896