:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Price, Daniel, Vellaisamy, Prabhu, Shen, John, Wu, Di
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Hardware Architecture
Online Access:	https://arxiv.org/abs/2601.10823
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024)

Catwalk: Unary Top-K for Efficient Ramp-No-Leak Neuron Design for Temporal Neural Networks
by: Lister, Devon, et al.
Published: (2025)

Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators
by: Vellaisamy, Prabhu, et al.
Published: (2026)

Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference
by: Nair, Harideep, et al.
Published: (2024)

Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
by: Zhang, Zehuan, et al.
Published: (2026)

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)

MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025)

tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)

NeuroAI Temporal Neural Networks (NeuTNNs): Microarchitecture and Design Framework for Specialized Neuromorphic Processing Units
by: Venkatachalam, Shanmuga, et al.
Published: (2026)

PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design
by: Wu, Yuchao, et al.
Published: (2025)

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
by: Shen, Chaoyao, et al.
Published: (2026)

Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs
by: Ahmadilivani, Mohammad Hasan, et al.
Published: (2026)

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond
by: Abi-Karam, Stefan, et al.
Published: (2024)

Tempus Core: Area-Power Efficient Temporal-Unary Convolution Core for Low-Precision Edge DLAs
by: Vellaisamy, Prabhu, et al.
Published: (2024)

RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction
by: Zuo, Dongsheng, et al.
Published: (2024)

Hardware-Aware Data and Instruction Mapping for AI Tasks: Balancing Parallelism, I/O and Memory Tradeoffs
by: Chowdhury, Md Rownak Hossain, et al.
Published: (2025)

Deep Inverse Design for High-Level Synthesis
by: Chang, Ping, et al.
Published: (2024)

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes
by: Wang, Miaoxin, et al.
Published: (2024)

TNNGen: Automated Design of Neuromorphic Sensory Processing Units for Time-Series Clustering
by: Vellaisamy, Prabhu, et al.
Published: (2024)

FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference
by: Hsieh, Fen-Yu, et al.
Published: (2025)

Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks
by: Wu, Qizhe, et al.
Published: (2024)

ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMs
by: Deng, Chenhui, et al.
Published: (2026)

TurboAttention: Efficient Attention Approximation For High Throughputs LLMs
by: Kang, Hao, et al.
Published: (2024)

EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning
by: Hu, Guangyu, et al.
Published: (2026)

RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects
by: Allam, Ahmed, et al.
Published: (2024)

COMET: Towards Partical W4A4KV4 LLMs Serving
by: Liu, Lian, et al.
Published: (2024)

Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning
by: Chiang, Hao-Wei, et al.
Published: (2024)

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores
by: Shen, Yixian, et al.
Published: (2026)

NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction
by: Wadle, Shayne, et al.
Published: (2025)

A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface
by: Yin, Guodong, et al.
Published: (2022)

Efficient Tabular Data Preprocessing of ML Pipelines
by: Zhu, Yu, et al.
Published: (2024)

TransAxx: Efficient Transformers with Approximate Computing
by: Danopoulos, Dimitrios, et al.
Published: (2024)

Designing Efficient LLM Accelerators for Edge Devices
by: Haris, Jude, et al.
Published: (2024)

FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)

Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models
by: Xu, Lei, et al.
Published: (2025)

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
by: He, Andy, et al.
Published: (2024)

Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing
by: Shin, Duckgyu, et al.
Published: (2026)

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
by: Wang, Chenyu, et al.
Published: (2023)

CircuitVAE: Efficient and Scalable Latent Circuit Optimization
by: Song, Jialin, et al.
Published: (2024)

TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
by: Yang, Jianlei, et al.
Published: (2023)