Saved in:
| Main Authors: | Zhang, W. B., Liu, Y. Q., Zang, T. H., Bao, Z. S. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.05621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators
by: Zhang, Chenguang, et al.
Published: (2024)
by: Zhang, Chenguang, et al.
Published: (2024)
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)
by: Mhatre, Kaustubh, et al.
Published: (2026)
DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator
by: Liu, Xingyu, et al.
Published: (2025)
by: Liu, Xingyu, et al.
Published: (2025)
Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms
by: Colleman, Steven, et al.
Published: (2024)
by: Colleman, Steven, et al.
Published: (2024)
SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design
by: Zhang, Haoyang, et al.
Published: (2025)
by: Zhang, Haoyang, et al.
Published: (2025)
A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing
by: Zhou, P. J., et al.
Published: (2024)
by: Zhou, P. J., et al.
Published: (2024)
CAT: Customized Transformer Accelerator Framework on Versal ACAP
by: Zhang, Wenbo, et al.
Published: (2024)
by: Zhang, Wenbo, et al.
Published: (2024)
SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding
by: Zhang, Junming, et al.
Published: (2026)
by: Zhang, Junming, et al.
Published: (2026)
CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design
by: Wan, Zishen, et al.
Published: (2025)
by: Wan, Zishen, et al.
Published: (2025)
Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
A Power-Efficient Hardware Implementation of L-Mul
by: Chen, Ruiqi, et al.
Published: (2024)
by: Chen, Ruiqi, et al.
Published: (2024)
ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance
by: Xie, Tong, et al.
Published: (2025)
by: Xie, Tong, et al.
Published: (2025)
Online Learning Extreme Learning Machine with Low-Complexity Predictive Plasticity Rule and FPGA Implementation
by: Zang, Zhenya, et al.
Published: (2025)
by: Zang, Zhenya, et al.
Published: (2025)
Performance evaluation of acceleration of convolutional layers on OpenEdgeCGRA
by: Carpentieri, Nicolò, et al.
Published: (2024)
by: Carpentieri, Nicolò, et al.
Published: (2024)
@NTT: Algorithm-Targeted NTT hardware acceleration via Design-Time Constant Optimization
by: Nabeel, Mohammed, et al.
Published: (2026)
by: Nabeel, Mohammed, et al.
Published: (2026)
How to keep pushing ML accelerator performance? Know your rooflines!
by: Verhelst, Marian, et al.
Published: (2025)
by: Verhelst, Marian, et al.
Published: (2025)
MatrixFlow: System-Accelerator co-design for high-performance transformer applications
by: Liu, Qunyou, et al.
Published: (2025)
by: Liu, Qunyou, et al.
Published: (2025)
LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMs
by: Sun, Miao, et al.
Published: (2026)
by: Sun, Miao, et al.
Published: (2026)
Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference
by: Wang, Xinyu, et al.
Published: (2026)
by: Wang, Xinyu, et al.
Published: (2026)
An Efficient Algorithm for Modulus Operation and Its Hardware Implementation in Prime Number Calculation
by: Wijesinghe, W. A. Susantha
Published: (2024)
by: Wijesinghe, W. A. Susantha
Published: (2024)
Evaluating Four FPGA-accelerated Space Use Cases based on Neural Network Algorithms for On-board Inference
by: Antunes, Pedro, et al.
Published: (2026)
by: Antunes, Pedro, et al.
Published: (2026)
SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator
by: Wang, Peipei, et al.
Published: (2025)
by: Wang, Peipei, et al.
Published: (2025)
SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
by: Wang, Wenxun, et al.
Published: (2025)
by: Wang, Wenxun, et al.
Published: (2025)
CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
by: Ahn, Bas, et al.
Published: (2026)
by: Ahn, Bas, et al.
Published: (2026)
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)
by: Xia, Tianhua, et al.
Published: (2025)
Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices
by: Zang, Zhenya, et al.
Published: (2023)
by: Zang, Zhenya, et al.
Published: (2023)
MING: An Automated CNN-to-Edge MLIR HLS framework
by: Bi, Jiahong, et al.
Published: (2026)
by: Bi, Jiahong, et al.
Published: (2026)
O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead
by: Cammarata, Danilo, et al.
Published: (2026)
by: Cammarata, Danilo, et al.
Published: (2026)
CRYPTONITE: Scalable Accelerator Design for Cryptographic Primitives and Algorithms
by: Maheswaran, Karthikeya Sharma, et al.
Published: (2025)
by: Maheswaran, Karthikeya Sharma, et al.
Published: (2025)
Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
by: Zhang, Zehuan, et al.
Published: (2026)
by: Zhang, Zehuan, et al.
Published: (2026)
Efficient Multi-Cycle Folded Integer Multipliers
by: Houraniah, Ahmad, et al.
Published: (2023)
by: Houraniah, Ahmad, et al.
Published: (2023)
A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture
by: Raman, Siddhartha Raman Sundara, et al.
Published: (2026)
by: Raman, Siddhartha Raman Sundara, et al.
Published: (2026)
EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring
by: Bao, Xuanhao, et al.
Published: (2026)
by: Bao, Xuanhao, et al.
Published: (2026)
Analyzing the capabilities of HLS and RTL tools in the design of an FPGA Montgomery Multiplier
by: Ifrim, Rares, et al.
Published: (2025)
by: Ifrim, Rares, et al.
Published: (2025)
A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks
by: Raman, Siddhartha Raman Sundara, et al.
Published: (2026)
by: Raman, Siddhartha Raman Sundara, et al.
Published: (2026)
ERASER: Efficient RTL FAult Simulation Framework with Trimmed Execution Redundancy
by: Tang, Jiaping, et al.
Published: (2025)
by: Tang, Jiaping, et al.
Published: (2025)
Hardware-Efficient Accurate 4-bit Multiplier for Xilinx 7 Series FPGAs
by: Kida, Misaki, et al.
Published: (2025)
by: Kida, Misaki, et al.
Published: (2025)
Accelerator-assisted Floating-point ASIP for Communication and Positioning in Massive MIMO Systems
by: Attari, Mohammad, et al.
Published: (2025)
by: Attari, Mohammad, et al.
Published: (2025)
Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models
by: Wei, Chiyue, et al.
Published: (2025)
by: Wei, Chiyue, et al.
Published: (2025)
SSRESF: Sensitivity-aware Single-particle Radiation Effects Simulation Framework in SoC Platforms based on SVM Algorithm
by: Liu, Meng, et al.
Published: (2024)
by: Liu, Meng, et al.
Published: (2024)
Similar Items
-
Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators
by: Zhang, Chenguang, et al.
Published: (2024) -
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026) -
DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator
by: Liu, Xingyu, et al.
Published: (2025) -
Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms
by: Colleman, Steven, et al.
Published: (2024) -
SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design
by: Zhang, Haoyang, et al.
Published: (2025)