Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.16896 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917048861327360 |
|---|---|
| author | Hu, Yiming |
| author_facet | Hu, Yiming |
| contents | Two-Phase TMR conserves energy by partitioning redundancy operations into two stages and making the execution of the third task copy optional, yet it remains susceptible to permanent faults. Reactive-TMR (R-TMR) counters this by isolating faulty cores, handling both transient and permanent faults. However, the lightweight hardware required by R-TMR not only increases complexity but also becomes a single point of failure itself. To bypass isolated node constraints, this paper proposes a Fault Tolerance and Isolation TMR (FTI-TMR) algorithm for interconnected multicore systems. By constructing a stability metric to identify the most reliable nodes in the system, which then perform periodic diagnostics to isolate permanent faults. Experimental results show that FTI-TMR reduces task workload by approximately 30% compared with baseline TMR while achieving higher permanent fault coverage. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_16896 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems Hu, Yiming Distributed, Parallel, and Cluster Computing Two-Phase TMR conserves energy by partitioning redundancy operations into two stages and making the execution of the third task copy optional, yet it remains susceptible to permanent faults. Reactive-TMR (R-TMR) counters this by isolating faulty cores, handling both transient and permanent faults. However, the lightweight hardware required by R-TMR not only increases complexity but also becomes a single point of failure itself. To bypass isolated node constraints, this paper proposes a Fault Tolerance and Isolation TMR (FTI-TMR) algorithm for interconnected multicore systems. By constructing a stability metric to identify the most reliable nodes in the system, which then perform periodic diagnostics to isolate permanent faults. Experimental results show that FTI-TMR reduces task workload by approximately 30% compared with baseline TMR while achieving higher permanent fault coverage. |
| title | FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems |
| topic | Distributed, Parallel, and Cluster Computing |
| url | https://arxiv.org/abs/2510.16896 |