Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lenzen, Christoph, Srinivas, Shreyas
Format:	Preprint
Published:	2023
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2301.05073
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916720561618944
author	Lenzen, Christoph Srinivas, Shreyas
author_facet	Lenzen, Christoph Srinivas, Shreyas
contents	Gradient clock synchronization (GCS) algorithms minimize the worst-case clock offset between the nodes in a distributed network of diameter $D$ and size $n$. They achieve optimal offsets of $Θ(\log D)$ locally, i.e. between adjacent nodes as shown by Lenzen et al., and $Θ(D)$ globally as shown by Biaz and Welch. As demonstrated in the work of Bund et al., this is a highly promising approach for improved clocking schemes for large-scale synchronous Systems-on-Chip (SoC). Unfortunately, in large systems, faults hinder their practical use. State of the art fault-tolerant, as presented by Bund et al., has a drawback that is fatal in this setting: It relies on node and edge replication. For $f=1$, this translates to at least $16$-fold edge replication and high degree nodes, far from the optimum of $2f+1=3$ for tolerating up to $f$ faulty neighbors. In this work, we present a self-stabilizing GCS algorithm for a grid-like directed graph with optimal node in- and out-degrees of $3$ that tolerates $1$ faulty in-neighbor. If nodes fail with independent probability $p\in o(n^{-1/2})$, it achieves asymptotically optimal local skew of $Θ(\log D)$ with probability $1-o(1)$; this holds under general worst-case assumptions on link delay and clock speed variations, provided they change slowly relative to the speed of the system. The failure probability is the largest possible ensuring that with probabity $1-o(1)$ for each node at most one in-neighbor fails. As modern hardware is clocked at gigahertz speeds and the algorithm can simultaneously sustain a constant number of arbitrary changes due to faults in each clock cycle, this results in sufficient robustness to dramatically increase the size of reliable synchronously clocked SoCs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2301_05073
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Clock Distribution with Gradient TRIX Lenzen, Christoph Srinivas, Shreyas Distributed, Parallel, and Cluster Computing Gradient clock synchronization (GCS) algorithms minimize the worst-case clock offset between the nodes in a distributed network of diameter $D$ and size $n$. They achieve optimal offsets of $Θ(\log D)$ locally, i.e. between adjacent nodes as shown by Lenzen et al., and $Θ(D)$ globally as shown by Biaz and Welch. As demonstrated in the work of Bund et al., this is a highly promising approach for improved clocking schemes for large-scale synchronous Systems-on-Chip (SoC). Unfortunately, in large systems, faults hinder their practical use. State of the art fault-tolerant, as presented by Bund et al., has a drawback that is fatal in this setting: It relies on node and edge replication. For $f=1$, this translates to at least $16$-fold edge replication and high degree nodes, far from the optimum of $2f+1=3$ for tolerating up to $f$ faulty neighbors. In this work, we present a self-stabilizing GCS algorithm for a grid-like directed graph with optimal node in- and out-degrees of $3$ that tolerates $1$ faulty in-neighbor. If nodes fail with independent probability $p\in o(n^{-1/2})$, it achieves asymptotically optimal local skew of $Θ(\log D)$ with probability $1-o(1)$; this holds under general worst-case assumptions on link delay and clock speed variations, provided they change slowly relative to the speed of the system. The failure probability is the largest possible ensuring that with probabity $1-o(1)$ for each node at most one in-neighbor fails. As modern hardware is clocked at gigahertz speeds and the algorithm can simultaneously sustain a constant number of arbitrary changes due to faults in each clock cycle, this results in sufficient robustness to dramatically increase the size of reliable synchronously clocked SoCs.
title	Clock Distribution with Gradient TRIX
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2301.05073

Similar Items