Saved in:
| Main Authors: | Huang, Yushan, Aloufi, Ranya, Cadet, Xavier, Zhao, Yuchen, Barnaghi, Payam, Haddadi, Hamed |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.08040 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Ultra-Low-Power $μ$NPUs
by: Millar, Josh, et al.
Published: (2025)
by: Millar, Josh, et al.
Published: (2025)
Energy-Aware Deep Learning on Resource-Constrained Hardware
by: Millar, Josh, et al.
Published: (2025)
by: Millar, Josh, et al.
Published: (2025)
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
by: Zheng, Size, et al.
Published: (2024)
by: Zheng, Size, et al.
Published: (2024)
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
by: Gong, Junfeng, et al.
Published: (2024)
by: Gong, Junfeng, et al.
Published: (2024)
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
by: Yan, Minghao, et al.
Published: (2023)
by: Yan, Minghao, et al.
Published: (2023)
Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices
by: Ribeiro, Marcelo, et al.
Published: (2025)
by: Ribeiro, Marcelo, et al.
Published: (2025)
Distributed Inference with Minimal Off-Chip Traffic for Transformers on Low-Power MCUs
by: Bochem, Severin, et al.
Published: (2024)
by: Bochem, Severin, et al.
Published: (2024)
Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks
by: Wu, Qizhe, et al.
Published: (2024)
by: Wu, Qizhe, et al.
Published: (2024)
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
by: Yang, Jianlei, et al.
Published: (2023)
by: Yang, Jianlei, et al.
Published: (2023)
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
by: Huang, Longwei, et al.
Published: (2023)
by: Huang, Longwei, et al.
Published: (2023)
Accelerating PoT Quantization on Edge Devices
by: Saha, Rappy, et al.
Published: (2024)
by: Saha, Rappy, et al.
Published: (2024)
An Early Experience with Confidential Computing Architecture for On-Device Model Protection
by: Abdollahi, Sina, et al.
Published: (2025)
by: Abdollahi, Sina, et al.
Published: (2025)
Designing Efficient LLM Accelerators for Edge Devices
by: Haris, Jude, et al.
Published: (2024)
by: Haris, Jude, et al.
Published: (2024)
Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search
by: Song, Chang Eun, et al.
Published: (2025)
by: Song, Chang Eun, et al.
Published: (2025)
TeLLMe: An Energy-Efficient Ternary LLM Accelerator for Prefilling and Decoding on Edge FPGAs
by: Qiao, Ye, et al.
Published: (2025)
by: Qiao, Ye, et al.
Published: (2025)
Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
by: Choi, Dawon, et al.
Published: (2026)
by: Choi, Dawon, et al.
Published: (2026)
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration
by: Xiang, Maoyang, et al.
Published: (2025)
by: Xiang, Maoyang, et al.
Published: (2025)
From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference
by: Ganti, Ravindra, et al.
Published: (2026)
by: Ganti, Ravindra, et al.
Published: (2026)
MAGE: A Multi-Agent Engine for Automated RTL Code Generation
by: Zhao, Yujie, et al.
Published: (2024)
by: Zhao, Yujie, et al.
Published: (2024)
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
by: Yun, Sungmin, et al.
Published: (2024)
by: Yun, Sungmin, et al.
Published: (2024)
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs
by: Wu, Qizhe, et al.
Published: (2025)
by: Wu, Qizhe, et al.
Published: (2025)
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
by: İslamoğlu, Gamze, et al.
Published: (2023)
by: İslamoğlu, Gamze, et al.
Published: (2023)
Exploration of Low-Power Flexible Stress Monitoring Classifiers for Conformal Wearables
by: Afentaki, Florentia, et al.
Published: (2025)
by: Afentaki, Florentia, et al.
Published: (2025)
Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware Operators
by: Alexandridis, Kosmas, et al.
Published: (2025)
by: Alexandridis, Kosmas, et al.
Published: (2025)
GCN-ABFT: Low-Cost Online Error Checking for Graph Convolutional Networks
by: Peltekis, Christodoulos, et al.
Published: (2024)
by: Peltekis, Christodoulos, et al.
Published: (2024)
Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow
by: Hsiung, Ching-Lin, et al.
Published: (2025)
by: Hsiung, Ching-Lin, et al.
Published: (2025)
Efficient FPGA Implementation of Time-Domain Popcount for Low-Complexity Machine Learning
by: Duan, Shengyu, et al.
Published: (2025)
by: Duan, Shengyu, et al.
Published: (2025)
MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation
by: Natesh, Vikas, et al.
Published: (2025)
by: Natesh, Vikas, et al.
Published: (2025)
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
by: Scherer, Moritz, et al.
Published: (2024)
by: Scherer, Moritz, et al.
Published: (2024)
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations
by: Fyon, Arthur, et al.
Published: (2026)
by: Fyon, Arthur, et al.
Published: (2026)
MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025)
by: Leon, Vasileios, et al.
Published: (2025)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics
by: Que, Zhiqiang, et al.
Published: (2022)
by: Que, Zhiqiang, et al.
Published: (2022)
HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays
by: Risso, Matteo, et al.
Published: (2024)
by: Risso, Matteo, et al.
Published: (2024)
Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications
by: Weng, Olivia, et al.
Published: (2024)
by: Weng, Olivia, et al.
Published: (2024)
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
by: Wang, Run, et al.
Published: (2025)
by: Wang, Run, et al.
Published: (2025)
PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
by: Andronic, Marta, et al.
Published: (2023)
by: Andronic, Marta, et al.
Published: (2023)
SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs
by: Zhang, Jintao, et al.
Published: (2026)
by: Zhang, Jintao, et al.
Published: (2026)
Bespoke Co-processor for Energy-Efficient Health Monitoring on RISC-V-based Flexible Wearables
by: Vergos, Theofanis, et al.
Published: (2025)
by: Vergos, Theofanis, et al.
Published: (2025)
xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems
by: Rutishauser, Georg, et al.
Published: (2024)
by: Rutishauser, Georg, et al.
Published: (2024)
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)
by: Xie, Xilong, et al.
Published: (2025)
Similar Items
-
Benchmarking Ultra-Low-Power $μ$NPUs
by: Millar, Josh, et al.
Published: (2025) -
Energy-Aware Deep Learning on Resource-Constrained Hardware
by: Millar, Josh, et al.
Published: (2025) -
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs
by: Zheng, Size, et al.
Published: (2024) -
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
by: Gong, Junfeng, et al.
Published: (2024) -
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
by: Yan, Minghao, et al.
Published: (2023)