Saved in:
| Main Authors: | Rösti, André, Franz, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.03083 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)
by: Mhatre, Kaustubh, et al.
Published: (2026)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning Inference
by: Kumar, Vineet, et al.
Published: (2025)
by: Kumar, Vineet, et al.
Published: (2025)
ReGate: Enabling Power Gating in Neural Processing Units
by: Xue, Yuqi, et al.
Published: (2025)
by: Xue, Yuqi, et al.
Published: (2025)
AMD Versal Implementations of FAM and SSCA Estimators
by: Li, Carol Jingyi, et al.
Published: (2025)
by: Li, Carol Jingyi, et al.
Published: (2025)
NeuroAI Temporal Neural Networks (NeuTNNs): Microarchitecture and Design Framework for Specialized Neuromorphic Processing Units
by: Venkatachalam, Shanmuga, et al.
Published: (2026)
by: Venkatachalam, Shanmuga, et al.
Published: (2026)
RTGPU: Real-Time Computing with Graphics Processing Units
by: Gheibi-Fetrat, Atiyeh, et al.
Published: (2025)
by: Gheibi-Fetrat, Atiyeh, et al.
Published: (2025)
AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines
by: Danopoulos, Dimitrios, et al.
Published: (2025)
by: Danopoulos, Dimitrios, et al.
Published: (2025)
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
by: Li, Xinyi, et al.
Published: (2024)
by: Li, Xinyi, et al.
Published: (2024)
Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN Inference
by: Petropoulos, Anastasios, et al.
Published: (2025)
by: Petropoulos, Anastasios, et al.
Published: (2025)
From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR
by: Wang, Erwei, et al.
Published: (2025)
by: Wang, Erwei, et al.
Published: (2025)
DPUConfig: Optimizing ML Inference in FPGAs Using Reinforcement Learning
by: Patras, Alexandros, et al.
Published: (2026)
by: Patras, Alexandros, et al.
Published: (2026)
RPU -- A Reasoning Processing Unit
by: Adiletta, Matthew, et al.
Published: (2026)
by: Adiletta, Matthew, et al.
Published: (2026)
Dynamic Power Control in a Hardware Neural Network with Error-Configurable MAC Units
by: Ghaderi, Maedeh, et al.
Published: (2024)
by: Ghaderi, Maedeh, et al.
Published: (2024)
e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications
by: Machetti, Simone, et al.
Published: (2025)
by: Machetti, Simone, et al.
Published: (2025)
Taming Performance Variability caused by Client-Side Hardware Configuration
by: Antoniou, Georgia, et al.
Published: (2024)
by: Antoniou, Georgia, et al.
Published: (2024)
ATiM: Autotuning Tensor Programs for Processing-in-DRAM
by: Shin, Yongwon, et al.
Published: (2024)
by: Shin, Yongwon, et al.
Published: (2024)
PPU: Design and Implementation of a Pipelined Full Posit Processing Unit
by: Rossi, Federico, et al.
Published: (2023)
by: Rossi, Federico, et al.
Published: (2023)
Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025)
by: Noh, Seock-Hwan, et al.
Published: (2025)
DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2023)
by: Oliveira, Geraldo F., et al.
Published: (2023)
Instruction Scheduling in the Saturn Vector Unit
by: Zhao, Jerry, et al.
Published: (2024)
by: Zhao, Jerry, et al.
Published: (2024)
MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores
by: Bertaccini, Luca, et al.
Published: (2022)
by: Bertaccini, Luca, et al.
Published: (2022)
Wet TinyML: Chemical Neural Network Using Gene Regulation and Cell Plasticity
by: Somathilaka, Samitha, et al.
Published: (2024)
by: Somathilaka, Samitha, et al.
Published: (2024)
New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
by: Oliveira, Geraldo F.
Published: (2025)
by: Oliveira, Geraldo F.
Published: (2025)
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
by: Lu, Jinming, et al.
Published: (2025)
by: Lu, Jinming, et al.
Published: (2025)
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025)
by: Feng, Dahu, et al.
Published: (2025)
TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator
by: Vungarala, Deepak, et al.
Published: (2025)
by: Vungarala, Deepak, et al.
Published: (2025)
Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Empowering Vector Architectures for ML: The CAMP Architecture for Matrix Multiplication
by: Nojehdeh, Mohammadreza Esmali, et al.
Published: (2025)
by: Nojehdeh, Mohammadreza Esmali, et al.
Published: (2025)
ML-based AIG Timing Prediction to Enhance Logic Optimization
by: Jiang, Wenjing, et al.
Published: (2024)
by: Jiang, Wenjing, et al.
Published: (2024)
An Energy-Efficient Approximate Posit Multiply-Divide Unit
by: Thotli, Rishi, et al.
Published: (2026)
by: Thotli, Rishi, et al.
Published: (2026)
Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices
by: Ribeiro, Marcelo, et al.
Published: (2025)
by: Ribeiro, Marcelo, et al.
Published: (2025)
CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance
by: Oh, Dongsuk, et al.
Published: (2025)
by: Oh, Dongsuk, et al.
Published: (2025)
How to keep pushing ML accelerator performance? Know your rooflines!
by: Verhelst, Marian, et al.
Published: (2025)
by: Verhelst, Marian, et al.
Published: (2025)
GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units
by: Das, Arghadip, et al.
Published: (2025)
by: Das, Arghadip, et al.
Published: (2025)
MTU: The Multifunction Tree Unit for Accelerating Zero-Knowledge Proofs
by: Mo, Jianqiao, et al.
Published: (2025)
by: Mo, Jianqiao, et al.
Published: (2025)
Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic
by: Gerlinghoff, Daniel, et al.
Published: (2024)
by: Gerlinghoff, Daniel, et al.
Published: (2024)
TRACE: Unlocking Effective CXL Bandwidth via Lossless Compression and Precision Scaling
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
Online Training and Inference System on Edge FPGA Using Delayed Feedback Reservoir
by: Ikeda, Sosei, et al.
Published: (2025)
by: Ikeda, Sosei, et al.
Published: (2025)
From PyTorch to Calyx: An Open-Source Compiler Toolchain for ML Accelerators
by: Xie, Jiahan, et al.
Published: (2025)
by: Xie, Jiahan, et al.
Published: (2025)
Similar Items
-
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026) -
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025) -
Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning Inference
by: Kumar, Vineet, et al.
Published: (2025) -
ReGate: Enabling Power Gating in Neural Processing Units
by: Xue, Yuqi, et al.
Published: (2025) -
AMD Versal Implementations of FAM and SSCA Estimators
by: Li, Carol Jingyi, et al.
Published: (2025)