:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, W. B., Liu, Y. Q., Zang, T. H., Bao, Z. S.
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2407.05621
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators
by: Zhang, Chenguang, et al.
Published: (2024)

Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)

DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator
by: Liu, Xingyu, et al.
Published: (2025)

Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms
by: Colleman, Steven, et al.
Published: (2024)

SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design
by: Zhang, Haoyang, et al.
Published: (2025)

A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing
by: Zhou, P. J., et al.
Published: (2024)

CAT: Customized Transformer Accelerator Framework on Versal ACAP
by: Zhang, Wenbo, et al.
Published: (2024)

SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding
by: Zhang, Junming, et al.
Published: (2026)

CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design
by: Wan, Zishen, et al.
Published: (2025)

Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2025)

A Power-Efficient Hardware Implementation of L-Mul
by: Chen, Ruiqi, et al.
Published: (2024)

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance
by: Xie, Tong, et al.
Published: (2025)

Online Learning Extreme Learning Machine with Low-Complexity Predictive Plasticity Rule and FPGA Implementation
by: Zang, Zhenya, et al.
Published: (2025)

Performance evaluation of acceleration of convolutional layers on OpenEdgeCGRA
by: Carpentieri, Nicolò, et al.
Published: (2024)

@NTT: Algorithm-Targeted NTT hardware acceleration via Design-Time Constant Optimization
by: Nabeel, Mohammed, et al.
Published: (2026)

How to keep pushing ML accelerator performance? Know your rooflines!
by: Verhelst, Marian, et al.
Published: (2025)

MatrixFlow: System-Accelerator co-design for high-performance transformer applications
by: Liu, Qunyou, et al.
Published: (2025)

LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMs
by: Sun, Miao, et al.
Published: (2026)

Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference
by: Wang, Xinyu, et al.
Published: (2026)

An Efficient Algorithm for Modulus Operation and Its Hardware Implementation in Prime Number Calculation
by: Wijesinghe, W. A. Susantha
Published: (2024)

Evaluating Four FPGA-accelerated Space Use Cases based on Neural Network Algorithms for On-board Inference
by: Antunes, Pedro, et al.
Published: (2026)

SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator
by: Wang, Peipei, et al.
Published: (2025)

SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
by: Wang, Wenxun, et al.
Published: (2025)

CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
by: Ahn, Bas, et al.
Published: (2026)

Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)

Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices
by: Zang, Zhenya, et al.
Published: (2023)

MING: An Automated CNN-to-Edge MLIR HLS framework
by: Bi, Jiahong, et al.
Published: (2026)

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead
by: Cammarata, Danilo, et al.
Published: (2026)

CRYPTONITE: Scalable Accelerator Design for Cryptographic Primitives and Algorithms
by: Maheswaran, Karthikeya Sharma, et al.
Published: (2025)

Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
by: Zhang, Zehuan, et al.
Published: (2026)

Efficient Multi-Cycle Folded Integer Multipliers
by: Houraniah, Ahmad, et al.
Published: (2023)

A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture
by: Raman, Siddhartha Raman Sundara, et al.
Published: (2026)

EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring
by: Bao, Xuanhao, et al.
Published: (2026)

Analyzing the capabilities of HLS and RTL tools in the design of an FPGA Montgomery Multiplier
by: Ifrim, Rares, et al.
Published: (2025)

A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks
by: Raman, Siddhartha Raman Sundara, et al.
Published: (2026)

ERASER: Efficient RTL FAult Simulation Framework with Trimmed Execution Redundancy
by: Tang, Jiaping, et al.
Published: (2025)

Hardware-Efficient Accurate 4-bit Multiplier for Xilinx 7 Series FPGAs
by: Kida, Misaki, et al.
Published: (2025)

Accelerator-assisted Floating-point ASIP for Communication and Positioning in Massive MIMO Systems
by: Attari, Mohammad, et al.
Published: (2025)

Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models
by: Wei, Chiyue, et al.
Published: (2025)

SSRESF: Sensitivity-aware Single-particle Radiation Effects Simulation Framework in SoC Platforms based on SVM Algorithm
by: Liu, Meng, et al.
Published: (2024)