Saved in:
Bibliographic Details
Main Authors: M, Ajay Kumar, O'Mahoney, Cian, Werle, Pedro Kreutz, Shanker, Shreejith, Nikolopoulos, Dimitrios S., Ji, Bo, Vandierendonck, Hans, John, Deepu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.01800
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913973122629632
author M, Ajay Kumar
O'Mahoney, Cian
Werle, Pedro Kreutz
Shanker, Shreejith
Nikolopoulos, Dimitrios S.
Ji, Bo
Vandierendonck, Hans
John, Deepu
author_facet M, Ajay Kumar
O'Mahoney, Cian
Werle, Pedro Kreutz
Shanker, Shreejith
Nikolopoulos, Dimitrios S.
Ji, Bo
Vandierendonck, Hans
John, Deepu
contents Deploying deep neural networks (DNNs) on resource-constrained IoT devices remains a challenging problem, often requiring hardware modifications tailored to individual AI models. Existing accelerator-generation tools, such as AMD's FINN, do not adequately address extreme resource limitations faced by IoT endpoints operating in bare-metal environments without an operating system (OS). To overcome these constraints, we propose MARVEL-an automated, end-to-end framework that generates custom RISC-V ISA extensions tailored to specific DNN model classes, with a primary focus on convolutional neural networks (CNNs). The proposed method profiles high-level DNN representations in Python and generates an ISA-extended RISC-V core with associated compiler tools for efficient deployment. The flow leverages (1) Apache TVM for translating high-level Python-based DNN models into optimized C code, (2) Synopsys ASIP Designer for identifying compute-intensive kernels, modeling, and generating a custom RISC-V and (3) Xilinx Vivado for FPGA implementation. Beyond a model class specific RISC-V, our approach produces an optimized bare-metal C implementation, eliminating the need for an OS or extensive software dependencies. Unlike conventional deployment pipelines relying on TensorFlow/PyTorch runtimes, our solution enables seamless execution in highly resource-constrained environments. We evaluated the flow on popular DNN models such as LeNet-5*, MobileNetV1, ResNet50, VGG16, MobileNetV2 and DenseNet121 using the Synopsys trv32p3 RISC-V core as a baseline. Results show a 2x speedup in inference and upto 2x reduction in energy per inference at a 28.23% area overhead when implemented on an AMD Zynq UltraScale+ ZCU104 FPGA platform.
format Preprint
id arxiv_https___arxiv_org_abs_2508_01800
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI
M, Ajay Kumar
O'Mahoney, Cian
Werle, Pedro Kreutz
Shanker, Shreejith
Nikolopoulos, Dimitrios S.
Ji, Bo
Vandierendonck, Hans
John, Deepu
Hardware Architecture
Deploying deep neural networks (DNNs) on resource-constrained IoT devices remains a challenging problem, often requiring hardware modifications tailored to individual AI models. Existing accelerator-generation tools, such as AMD's FINN, do not adequately address extreme resource limitations faced by IoT endpoints operating in bare-metal environments without an operating system (OS). To overcome these constraints, we propose MARVEL-an automated, end-to-end framework that generates custom RISC-V ISA extensions tailored to specific DNN model classes, with a primary focus on convolutional neural networks (CNNs). The proposed method profiles high-level DNN representations in Python and generates an ISA-extended RISC-V core with associated compiler tools for efficient deployment. The flow leverages (1) Apache TVM for translating high-level Python-based DNN models into optimized C code, (2) Synopsys ASIP Designer for identifying compute-intensive kernels, modeling, and generating a custom RISC-V and (3) Xilinx Vivado for FPGA implementation. Beyond a model class specific RISC-V, our approach produces an optimized bare-metal C implementation, eliminating the need for an OS or extensive software dependencies. Unlike conventional deployment pipelines relying on TensorFlow/PyTorch runtimes, our solution enables seamless execution in highly resource-constrained environments. We evaluated the flow on popular DNN models such as LeNet-5*, MobileNetV1, ResNet50, VGG16, MobileNetV2 and DenseNet121 using the Synopsys trv32p3 RISC-V core as a baseline. Results show a 2x speedup in inference and upto 2x reduction in energy per inference at a 28.23% area overhead when implemented on an AMD Zynq UltraScale+ ZCU104 FPGA platform.
title MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI
topic Hardware Architecture
url https://arxiv.org/abs/2508.01800