Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	M, Ajay Kumar, O'Mahoney, Cian, Werle, Pedro Kreutz, Shanker, Shreejith, Nikolopoulos, Dimitrios S., Ji, Bo, Vandierendonck, Hans, John, Deepu
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2508.01800
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913973122629632
author	M, Ajay Kumar O'Mahoney, Cian Werle, Pedro Kreutz Shanker, Shreejith Nikolopoulos, Dimitrios S. Ji, Bo Vandierendonck, Hans John, Deepu
author_facet	M, Ajay Kumar O'Mahoney, Cian Werle, Pedro Kreutz Shanker, Shreejith Nikolopoulos, Dimitrios S. Ji, Bo Vandierendonck, Hans John, Deepu
contents	Deploying deep neural networks (DNNs) on resource-constrained IoT devices remains a challenging problem, often requiring hardware modifications tailored to individual AI models. Existing accelerator-generation tools, such as AMD's FINN, do not adequately address extreme resource limitations faced by IoT endpoints operating in bare-metal environments without an operating system (OS). To overcome these constraints, we propose MARVEL-an automated, end-to-end framework that generates custom RISC-V ISA extensions tailored to specific DNN model classes, with a primary focus on convolutional neural networks (CNNs). The proposed method profiles high-level DNN representations in Python and generates an ISA-extended RISC-V core with associated compiler tools for efficient deployment. The flow leverages (1) Apache TVM for translating high-level Python-based DNN models into optimized C code, (2) Synopsys ASIP Designer for identifying compute-intensive kernels, modeling, and generating a custom RISC-V and (3) Xilinx Vivado for FPGA implementation. Beyond a model class specific RISC-V, our approach produces an optimized bare-metal C implementation, eliminating the need for an OS or extensive software dependencies. Unlike conventional deployment pipelines relying on TensorFlow/PyTorch runtimes, our solution enables seamless execution in highly resource-constrained environments. We evaluated the flow on popular DNN models such as LeNet-5*, MobileNetV1, ResNet50, VGG16, MobileNetV2 and DenseNet121 using the Synopsys trv32p3 RISC-V core as a baseline. Results show a 2x speedup in inference and upto 2x reduction in energy per inference at a 28.23% area overhead when implemented on an AMD Zynq UltraScale+ ZCU104 FPGA platform.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_01800
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI M, Ajay Kumar O'Mahoney, Cian Werle, Pedro Kreutz Shanker, Shreejith Nikolopoulos, Dimitrios S. Ji, Bo Vandierendonck, Hans John, Deepu Hardware Architecture Deploying deep neural networks (DNNs) on resource-constrained IoT devices remains a challenging problem, often requiring hardware modifications tailored to individual AI models. Existing accelerator-generation tools, such as AMD's FINN, do not adequately address extreme resource limitations faced by IoT endpoints operating in bare-metal environments without an operating system (OS). To overcome these constraints, we propose MARVEL-an automated, end-to-end framework that generates custom RISC-V ISA extensions tailored to specific DNN model classes, with a primary focus on convolutional neural networks (CNNs). The proposed method profiles high-level DNN representations in Python and generates an ISA-extended RISC-V core with associated compiler tools for efficient deployment. The flow leverages (1) Apache TVM for translating high-level Python-based DNN models into optimized C code, (2) Synopsys ASIP Designer for identifying compute-intensive kernels, modeling, and generating a custom RISC-V and (3) Xilinx Vivado for FPGA implementation. Beyond a model class specific RISC-V, our approach produces an optimized bare-metal C implementation, eliminating the need for an OS or extensive software dependencies. Unlike conventional deployment pipelines relying on TensorFlow/PyTorch runtimes, our solution enables seamless execution in highly resource-constrained environments. We evaluated the flow on popular DNN models such as LeNet-5*, MobileNetV1, ResNet50, VGG16, MobileNetV2 and DenseNet121 using the Synopsys trv32p3 RISC-V core as a baseline. Results show a 2x speedup in inference and upto 2x reduction in energy per inference at a 28.23% area overhead when implemented on an AMD Zynq UltraScale+ ZCU104 FPGA platform.
title	MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI
topic	Hardware Architecture
url	https://arxiv.org/abs/2508.01800

Similar Items