Saved in:
| Main Authors: | Price, Daniel, Vellaisamy, Prabhu, Shen, John, Wu, Di |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.10823 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024)
by: Nair, Harideep, et al.
Published: (2024)
Catwalk: Unary Top-K for Efficient Ramp-No-Leak Neuron Design for Temporal Neural Networks
by: Lister, Devon, et al.
Published: (2025)
by: Lister, Devon, et al.
Published: (2025)
Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators
by: Vellaisamy, Prabhu, et al.
Published: (2026)
by: Vellaisamy, Prabhu, et al.
Published: (2026)
Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference
by: Nair, Harideep, et al.
Published: (2024)
by: Nair, Harideep, et al.
Published: (2024)
Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
by: Zhang, Zehuan, et al.
Published: (2026)
by: Zhang, Zehuan, et al.
Published: (2026)
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)
MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025)
by: Leon, Vasileios, et al.
Published: (2025)
tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)
by: Vellaisamy, Prabhu, et al.
Published: (2024)
NeuroAI Temporal Neural Networks (NeuTNNs): Microarchitecture and Design Framework for Specialized Neuromorphic Processing Units
by: Venkatachalam, Shanmuga, et al.
Published: (2026)
by: Venkatachalam, Shanmuga, et al.
Published: (2026)
PICBench: Benchmarking LLMs for Photonic Integrated Circuits Design
by: Wu, Yuchao, et al.
Published: (2025)
by: Wu, Yuchao, et al.
Published: (2025)
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
by: Shen, Chaoyao, et al.
Published: (2026)
by: Shen, Chaoyao, et al.
Published: (2026)
Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs
by: Ahmadilivani, Mohammad Hasan, et al.
Published: (2026)
by: Ahmadilivani, Mohammad Hasan, et al.
Published: (2026)
HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond
by: Abi-Karam, Stefan, et al.
Published: (2024)
by: Abi-Karam, Stefan, et al.
Published: (2024)
Tempus Core: Area-Power Efficient Temporal-Unary Convolution Core for Low-Precision Edge DLAs
by: Vellaisamy, Prabhu, et al.
Published: (2024)
by: Vellaisamy, Prabhu, et al.
Published: (2024)
RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction
by: Zuo, Dongsheng, et al.
Published: (2024)
by: Zuo, Dongsheng, et al.
Published: (2024)
Hardware-Aware Data and Instruction Mapping for AI Tasks: Balancing Parallelism, I/O and Memory Tradeoffs
by: Chowdhury, Md Rownak Hossain, et al.
Published: (2025)
by: Chowdhury, Md Rownak Hossain, et al.
Published: (2025)
Deep Inverse Design for High-Level Synthesis
by: Chang, Ping, et al.
Published: (2024)
by: Chang, Ping, et al.
Published: (2024)
An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes
by: Wang, Miaoxin, et al.
Published: (2024)
by: Wang, Miaoxin, et al.
Published: (2024)
TNNGen: Automated Design of Neuromorphic Sensory Processing Units for Time-Series Clustering
by: Vellaisamy, Prabhu, et al.
Published: (2024)
by: Vellaisamy, Prabhu, et al.
Published: (2024)
FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference
by: Hsieh, Fen-Yu, et al.
Published: (2025)
by: Hsieh, Fen-Yu, et al.
Published: (2025)
Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks
by: Wu, Qizhe, et al.
Published: (2024)
by: Wu, Qizhe, et al.
Published: (2024)
ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMs
by: Deng, Chenhui, et al.
Published: (2026)
by: Deng, Chenhui, et al.
Published: (2026)
TurboAttention: Efficient Attention Approximation For High Throughputs LLMs
by: Kang, Hao, et al.
Published: (2024)
by: Kang, Hao, et al.
Published: (2024)
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning
by: Hu, Guangyu, et al.
Published: (2026)
by: Hu, Guangyu, et al.
Published: (2026)
RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects
by: Allam, Ahmed, et al.
Published: (2024)
by: Allam, Ahmed, et al.
Published: (2024)
COMET: Towards Partical W4A4KV4 LLMs Serving
by: Liu, Lian, et al.
Published: (2024)
by: Liu, Lian, et al.
Published: (2024)
Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning
by: Chiang, Hao-Wei, et al.
Published: (2024)
by: Chiang, Hao-Wei, et al.
Published: (2024)
Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores
by: Shen, Yixian, et al.
Published: (2026)
by: Shen, Yixian, et al.
Published: (2026)
NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction
by: Wadle, Shayne, et al.
Published: (2025)
by: Wadle, Shayne, et al.
Published: (2025)
A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface
by: Yin, Guodong, et al.
Published: (2022)
by: Yin, Guodong, et al.
Published: (2022)
Efficient Tabular Data Preprocessing of ML Pipelines
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
TransAxx: Efficient Transformers with Approximate Computing
by: Danopoulos, Dimitrios, et al.
Published: (2024)
by: Danopoulos, Dimitrios, et al.
Published: (2024)
Designing Efficient LLM Accelerators for Edge Devices
by: Haris, Jude, et al.
Published: (2024)
by: Haris, Jude, et al.
Published: (2024)
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)
by: Xie, Xilong, et al.
Published: (2025)
Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models
by: Xu, Lei, et al.
Published: (2025)
by: Xu, Lei, et al.
Published: (2025)
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
by: He, Andy, et al.
Published: (2024)
by: He, Andy, et al.
Published: (2024)
Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing
by: Shin, Duckgyu, et al.
Published: (2026)
by: Shin, Duckgyu, et al.
Published: (2026)
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
by: Wang, Chenyu, et al.
Published: (2023)
by: Wang, Chenyu, et al.
Published: (2023)
CircuitVAE: Efficient and Scalable Latent Circuit Optimization
by: Song, Jialin, et al.
Published: (2024)
by: Song, Jialin, et al.
Published: (2024)
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
by: Yang, Jianlei, et al.
Published: (2023)
by: Yang, Jianlei, et al.
Published: (2023)
Similar Items
-
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024) -
Catwalk: Unary Top-K for Efficient Ramp-No-Leak Neuron Design for Temporal Neural Networks
by: Lister, Devon, et al.
Published: (2025) -
Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators
by: Vellaisamy, Prabhu, et al.
Published: (2026) -
Commercial Evaluation of Zero-Skipping MAC Design for Bit Sparsity Exploitation in DL Inference
by: Nair, Harideep, et al.
Published: (2024) -
Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
by: Zhang, Zehuan, et al.
Published: (2026)