:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rösti, André, Franz, Michael
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2504.03083
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)

GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)

Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning Inference
by: Kumar, Vineet, et al.
Published: (2025)

ReGate: Enabling Power Gating in Neural Processing Units
by: Xue, Yuqi, et al.
Published: (2025)

AMD Versal Implementations of FAM and SSCA Estimators
by: Li, Carol Jingyi, et al.
Published: (2025)

NeuroAI Temporal Neural Networks (NeuTNNs): Microarchitecture and Design Framework for Specialized Neuromorphic Processing Units
by: Venkatachalam, Shanmuga, et al.
Published: (2026)

RTGPU: Real-Time Computing with Graphics Processing Units
by: Gheibi-Fetrat, Atiyeh, et al.
Published: (2025)

AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines
by: Danopoulos, Dimitrios, et al.
Published: (2025)

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
by: Li, Xinyi, et al.
Published: (2024)

Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN Inference
by: Petropoulos, Anastasios, et al.
Published: (2025)

From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR
by: Wang, Erwei, et al.
Published: (2025)

DPUConfig: Optimizing ML Inference in FPGAs Using Reinforcement Learning
by: Patras, Alexandros, et al.
Published: (2026)

RPU -- A Reasoning Processing Unit
by: Adiletta, Matthew, et al.
Published: (2026)

Dynamic Power Control in a Hardware Neural Network with Error-Configurable MAC Units
by: Ghaderi, Maedeh, et al.
Published: (2024)

e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications
by: Machetti, Simone, et al.
Published: (2025)

Taming Performance Variability caused by Client-Side Hardware Configuration
by: Antoniou, Georgia, et al.
Published: (2024)

ATiM: Autotuning Tensor Programs for Processing-in-DRAM
by: Shin, Yongwon, et al.
Published: (2024)

PPU: Design and Implementation of a Pipelined Full Posit Processing Unit
by: Rossi, Federico, et al.
Published: (2023)

Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025)

DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2023)

Instruction Scheduling in the Saturn Vector Unit
by: Zhao, Jerry, et al.
Published: (2024)

MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores
by: Bertaccini, Luca, et al.
Published: (2022)

Wet TinyML: Chemical Neural Network Using Gene Regulation and Cell Plasticity
by: Somathilaka, Samitha, et al.
Published: (2024)

New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
by: Oliveira, Geraldo F.
Published: (2025)

FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
by: Lu, Jinming, et al.
Published: (2025)

Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025)

TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator
by: Vungarala, Deepak, et al.
Published: (2025)

Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
by: Liu, Yang, et al.
Published: (2025)

Empowering Vector Architectures for ML: The CAMP Architecture for Matrix Multiplication
by: Nojehdeh, Mohammadreza Esmali, et al.
Published: (2025)

ML-based AIG Timing Prediction to Enhance Logic Optimization
by: Jiang, Wenjing, et al.
Published: (2024)

An Energy-Efficient Approximate Posit Multiply-Divide Unit
by: Thotli, Rishi, et al.
Published: (2026)

Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices
by: Ribeiro, Marcelo, et al.
Published: (2025)

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance
by: Oh, Dongsuk, et al.
Published: (2025)

How to keep pushing ML accelerator performance? Know your rooflines!
by: Verhelst, Marian, et al.
Published: (2025)

GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units
by: Das, Arghadip, et al.
Published: (2025)

MTU: The Multifunction Tree Unit for Accelerating Zero-Knowledge Proofs
by: Mo, Jianqiao, et al.
Published: (2025)

Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic
by: Gerlinghoff, Daniel, et al.
Published: (2024)

TRACE: Unlocking Effective CXL Bandwidth via Lossless Compression and Precision Scaling
by: Xie, Rui, et al.
Published: (2025)

Online Training and Inference System on Edge FPGA Using Delayed Feedback Reservoir
by: Ikeda, Sosei, et al.
Published: (2025)

From PyTorch to Calyx: An Open-Source Compiler Toolchain for ML Accelerators
by: Xie, Jiahan, et al.
Published: (2025)