:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yushan, Aloufi, Ranya, Cadet, Xavier, Zhao, Yuchen, Barnaghi, Payam, Haddadi, Hamed
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Hardware Architecture
Online Access:	https://arxiv.org/abs/2403.08040
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Benchmarking Ultra-Low-Power $μ$NPUs
by: Millar, Josh, et al.
Published: (2025)

Energy-Aware Deep Learning on Resource-Constrained Hardware
by: Millar, Josh, et al.
Published: (2025)

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
by: Zheng, Size, et al.
Published: (2024)

MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
by: Gong, Junfeng, et al.
Published: (2024)

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
by: Yan, Minghao, et al.
Published: (2023)

Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices
by: Ribeiro, Marcelo, et al.
Published: (2025)

Distributed Inference with Minimal Off-Chip Traffic for Transformers on Low-Power MCUs
by: Bochem, Severin, et al.
Published: (2024)

Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks
by: Wu, Qizhe, et al.
Published: (2024)

TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
by: Yang, Jianlei, et al.
Published: (2023)

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
by: Huang, Longwei, et al.
Published: (2023)

Accelerating PoT Quantization on Edge Devices
by: Saha, Rappy, et al.
Published: (2024)

An Early Experience with Confidential Computing Architecture for On-Device Model Protection
by: Abdollahi, Sina, et al.
Published: (2025)

Designing Efficient LLM Accelerators for Edge Devices
by: Haris, Jude, et al.
Published: (2024)

Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search
by: Song, Chang Eun, et al.
Published: (2025)

TeLLMe: An Energy-Efficient Ternary LLM Accelerator for Prefilling and Decoding on Edge FPGAs
by: Qiao, Ye, et al.
Published: (2025)

Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
by: Choi, Dawon, et al.
Published: (2026)

On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration
by: Xiang, Maoyang, et al.
Published: (2025)

From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference
by: Ganti, Ravindra, et al.
Published: (2026)

MAGE: A Multi-Agent Engine for Automated RTL Code Generation
by: Zhao, Yujie, et al.
Published: (2024)

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
by: Yun, Sungmin, et al.
Published: (2024)

Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs
by: Wu, Qizhe, et al.
Published: (2025)

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
by: İslamoğlu, Gamze, et al.
Published: (2023)

Exploration of Low-Power Flexible Stress Monitoring Classifiers for Conformal Wearables
by: Afentaki, Florentia, et al.
Published: (2025)

Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware Operators
by: Alexandridis, Kosmas, et al.
Published: (2025)

GCN-ABFT: Low-Cost Online Error Checking for Graph Convolutional Networks
by: Peltekis, Christodoulos, et al.
Published: (2024)

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow
by: Hsiung, Ching-Lin, et al.
Published: (2025)

Efficient FPGA Implementation of Time-Domain Popcount for Low-Complexity Machine Learning
by: Duan, Shengyu, et al.
Published: (2025)

MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation
by: Natesh, Vikas, et al.
Published: (2025)

Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
by: Scherer, Moritz, et al.
Published: (2024)

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations
by: Fyon, Arthur, et al.
Published: (2026)

MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025)

LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics
by: Que, Zhiqiang, et al.
Published: (2022)

HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays
by: Risso, Matteo, et al.
Published: (2024)

Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications
by: Weng, Olivia, et al.
Published: (2024)

VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
by: Wang, Run, et al.
Published: (2025)

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
by: Andronic, Marta, et al.
Published: (2023)

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs
by: Zhang, Jintao, et al.
Published: (2026)

Bespoke Co-processor for Energy-Efficient Health Monitoring on RISC-V-based Flexible Wearables
by: Vergos, Theofanis, et al.
Published: (2025)

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems
by: Rutishauser, Georg, et al.
Published: (2024)

FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)