Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Toupas, Petros, Yu, Zhewen, Bouganis, Christos-Savvas, Tzovaras, Dimitrios
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2403.18921
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916182027665408
author	Toupas, Petros Yu, Zhewen Bouganis, Christos-Savvas Tzovaras, Dimitrios
author_facet	Toupas, Petros Yu, Zhewen Bouganis, Christos-Savvas Tzovaras, Dimitrios
contents	Convolutional Neural Networks (CNNs) have demonstrated their effectiveness in numerous vision tasks. However, their high processing requirements necessitate efficient hardware acceleration to meet the application's performance targets. In the space of FPGAs, streaming-based dataflow architectures are often adopted by users, as significant performance gains can be achieved through layer-wise pipelining and reduced off-chip memory access by retaining data on-chip. However, modern topologies, such as the UNet, YOLO, and X3D models, utilise long skip connections, requiring significant on-chip storage and thus limiting the performance achieved by such system architectures. The paper addresses the above limitation by introducing weight and activation eviction mechanisms to off-chip memory along the computational pipeline, taking into account the available compute and memory resources. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. This enables the mapping of such modern CNNs to devices with limited on-chip memory, under the streaming architecture design approach. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks, achieving up to 10.65 X throughput improvement compared to previous works.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_18921
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction Toupas, Petros Yu, Zhewen Bouganis, Christos-Savvas Tzovaras, Dimitrios Hardware Architecture Computer Vision and Pattern Recognition Machine Learning Convolutional Neural Networks (CNNs) have demonstrated their effectiveness in numerous vision tasks. However, their high processing requirements necessitate efficient hardware acceleration to meet the application's performance targets. In the space of FPGAs, streaming-based dataflow architectures are often adopted by users, as significant performance gains can be achieved through layer-wise pipelining and reduced off-chip memory access by retaining data on-chip. However, modern topologies, such as the UNet, YOLO, and X3D models, utilise long skip connections, requiring significant on-chip storage and thus limiting the performance achieved by such system architectures. The paper addresses the above limitation by introducing weight and activation eviction mechanisms to off-chip memory along the computational pipeline, taking into account the available compute and memory resources. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. This enables the mapping of such modern CNNs to devices with limited on-chip memory, under the streaming architecture design approach. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks, achieving up to 10.65 X throughput improvement compared to previous works.
title	SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction
topic	Hardware Architecture Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2403.18921

Similar Items