Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ressa, Eugenio, Marchisio, Alberto, Martina, Maurizio, Masera, Guido, Shafique, Muhammad
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.09780
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910928532930560
author	Ressa, Eugenio Marchisio, Alberto Martina, Maurizio Masera, Guido Shafique, Muhammad
author_facet	Ressa, Eugenio Marchisio, Alberto Martina, Maurizio Masera, Guido Shafique, Muhammad
contents	The Continuous Learning (CL) paradigm consists of continuously evolving the parameters of the Deep Neural Network (DNN) model to progressively learn to perform new tasks without reducing the performance on previous tasks, i.e., avoiding the so-called catastrophic forgetting. However, the DNN parameter update in CL-based autonomous systems is extremely resource-hungry. The existing DNN accelerators cannot be directly employed in CL because they only support the execution of the forward propagation. Only a few prior architectures execute the backpropagation and weight update, but they lack the control and management for CL. Towards this, we design a hardware architecture, TinyCL, to perform CL on resource-constrained autonomous systems. It consists of a processing unit that executes both forward and backward propagation, and a control unit that manages memory-based CL workload. To minimize the memory accesses, the sliding window of the convolutional layer moves in a snake-like fashion. Moreover, the Multiply-and-Accumulate units can be reconfigured at runtime to execute different operations. As per our knowledge, our proposed TinyCL represents the first hardware accelerator that executes CL on autonomous systems. We synthesize the complete TinyCL architecture in a 65 nm CMOS technology node with the conventional ASIC design flow. It executes 1 epoch of training on a Conv + ReLU + Dense model on the CIFAR10 dataset in 1.76 s, while 1 training epoch of the same model using an Nvidia Tesla P100 GPU takes 103 s, thus achieving a 58x speedup, consuming 86 mW in a 4.74 mm2 die.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_09780
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	TinyCL: An Efficient Hardware Architecture for Continual Learning on Autonomous Systems Ressa, Eugenio Marchisio, Alberto Martina, Maurizio Masera, Guido Shafique, Muhammad Machine Learning The Continuous Learning (CL) paradigm consists of continuously evolving the parameters of the Deep Neural Network (DNN) model to progressively learn to perform new tasks without reducing the performance on previous tasks, i.e., avoiding the so-called catastrophic forgetting. However, the DNN parameter update in CL-based autonomous systems is extremely resource-hungry. The existing DNN accelerators cannot be directly employed in CL because they only support the execution of the forward propagation. Only a few prior architectures execute the backpropagation and weight update, but they lack the control and management for CL. Towards this, we design a hardware architecture, TinyCL, to perform CL on resource-constrained autonomous systems. It consists of a processing unit that executes both forward and backward propagation, and a control unit that manages memory-based CL workload. To minimize the memory accesses, the sliding window of the convolutional layer moves in a snake-like fashion. Moreover, the Multiply-and-Accumulate units can be reconfigured at runtime to execute different operations. As per our knowledge, our proposed TinyCL represents the first hardware accelerator that executes CL on autonomous systems. We synthesize the complete TinyCL architecture in a 65 nm CMOS technology node with the conventional ASIC design flow. It executes 1 epoch of training on a Conv + ReLU + Dense model on the CIFAR10 dataset in 1.76 s, while 1 training epoch of the same model using an Nvidia Tesla P100 GPU takes 103 s, thus achieving a 58x speedup, consuming 86 mW in a 4.74 mm2 die.
title	TinyCL: An Efficient Hardware Architecture for Continual Learning on Autonomous Systems
topic	Machine Learning
url	https://arxiv.org/abs/2402.09780

Similar Items