Saved in:
| Main Authors: | Kabir, MD Arafat, Kamucheka, Tendayi, Fredricks, Nathaniel, Mandebi, Joel, Bakos, Jason, Huang, Miaoqing, Andrews, David |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.04367 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators
by: Kabir, MD Arafat, et al.
Published: (2024)
by: Kabir, MD Arafat, et al.
Published: (2024)
A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs
by: Kabir, Ehsan, et al.
Published: (2024)
by: Kabir, Ehsan, et al.
Published: (2024)
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs
by: Kabir, Ehsan, et al.
Published: (2024)
by: Kabir, Ehsan, et al.
Published: (2024)
ProTEA: Programmable Transformer Encoder Acceleration on FPGA
by: Kabir, Ehsan, et al.
Published: (2024)
by: Kabir, Ehsan, et al.
Published: (2024)
N-TORC: Native Tensor Optimizer for Real-time Constraints
by: Singh, Suyash Vardhan, et al.
Published: (2025)
by: Singh, Suyash Vardhan, et al.
Published: (2025)
Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV
by: Zhang, Jingyao, et al.
Published: (2025)
by: Zhang, Jingyao, et al.
Published: (2025)
FireFly-T: High-Throughput Sparsity Exploitation for Spiking Transformer Acceleration with Dual-Engine Overlay Architecture
by: Li, Tenglong, et al.
Published: (2025)
by: Li, Tenglong, et al.
Published: (2025)
DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
by: Yi, Xiaoling, et al.
Published: (2025)
by: Yi, Xiaoling, et al.
Published: (2025)
To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems
by: Chen, Xingzhen, et al.
Published: (2026)
by: Chen, Xingzhen, et al.
Published: (2026)
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)
by: Mhatre, Kaustubh, et al.
Published: (2026)
Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2024)
by: Andrulis, Tanner, et al.
Published: (2024)
Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality
by: Hoornaert, Denis, et al.
Published: (2026)
by: Hoornaert, Denis, et al.
Published: (2026)
ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics
by: Gao, Ruijie, et al.
Published: (2026)
by: Gao, Ruijie, et al.
Published: (2026)
SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
by: He, Zicheng, et al.
Published: (2026)
by: He, Zicheng, et al.
Published: (2026)
Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication
by: Ohno, Ayumi, et al.
Published: (2025)
by: Ohno, Ayumi, et al.
Published: (2025)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
TYTAN: Taylor-series based Non-Linear Activation Engine for Deep Learning Accelerators
by: Pramanik, Soham, et al.
Published: (2025)
by: Pramanik, Soham, et al.
Published: (2025)
LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators
by: Li, Changhong, et al.
Published: (2025)
by: Li, Changhong, et al.
Published: (2025)
RACE-IT: A Reconfigurable Analog Computing Engine for In-Memory Transformer Acceleration
by: Zhao, Lei, et al.
Published: (2023)
by: Zhao, Lei, et al.
Published: (2023)
Stream-HLS: Towards Automatic Dataflow Acceleration
by: Basalama, Suhail, et al.
Published: (2025)
by: Basalama, Suhail, et al.
Published: (2025)
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)
by: Li, Huize, et al.
Published: (2026)
Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks
by: Lin, Xipeng, et al.
Published: (2024)
by: Lin, Xipeng, et al.
Published: (2024)
Bancroft: Genomics Acceleration Beyond On-Device Memory
by: Lim, Se-Min, et al.
Published: (2025)
by: Lim, Se-Min, et al.
Published: (2025)
CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems
by: Amin, Md Hasibul, et al.
Published: (2025)
by: Amin, Md Hasibul, et al.
Published: (2025)
ADS-IMC: Accelerating Data Sorting with In-Memory Computation
by: Dhakad, Narendra Singh, et al.
Published: (2026)
by: Dhakad, Narendra Singh, et al.
Published: (2026)
An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators
by: Qararyah, Fareed, et al.
Published: (2025)
by: Qararyah, Fareed, et al.
Published: (2025)
Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)
by: Pouget, Stéphane, et al.
Published: (2025)
Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits
by: Jin, Haoran, et al.
Published: (2025)
by: Jin, Haoran, et al.
Published: (2025)
PIMCOMP: An End-to-End DNN Compiler for Processing-In-Memory Accelerators
by: Sun, Xiaotian, et al.
Published: (2024)
by: Sun, Xiaotian, et al.
Published: (2024)
AME-PIM: Can Memory be Your Next Tensor Accelerator?
by: Venieri, Emanuele, et al.
Published: (2026)
by: Venieri, Emanuele, et al.
Published: (2026)
Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining Strategy for Processing-In-Memory Accelerators
by: Wang, Ruibao, et al.
Published: (2024)
by: Wang, Ruibao, et al.
Published: (2024)
CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool
by: Andrulis, Tanner, et al.
Published: (2024)
by: Andrulis, Tanner, et al.
Published: (2024)
PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators
by: Wang, Xinyu, et al.
Published: (2024)
by: Wang, Xinyu, et al.
Published: (2024)
IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)
by: Seo, Minseok, et al.
Published: (2024)
Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing
by: Wang, Chuanzhen, et al.
Published: (2026)
by: Wang, Chuanzhen, et al.
Published: (2026)
AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
by: Xue, Chenhao, et al.
Published: (2026)
by: Xue, Chenhao, et al.
Published: (2026)
AutoRAC: Automated Processing-in-Memory Accelerator Design for Recommender Systems
by: Cheng, Feng, et al.
Published: (2025)
by: Cheng, Feng, et al.
Published: (2025)
PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers
by: Wu, Yuting, et al.
Published: (2023)
by: Wu, Yuting, et al.
Published: (2023)
CAMASim: A Comprehensive Simulation Framework for Content-Addressable Memory based Accelerators
by: Li, Mengyuan, et al.
Published: (2024)
by: Li, Mengyuan, et al.
Published: (2024)
Similar Items
-
The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators
by: Kabir, MD Arafat, et al.
Published: (2024) -
A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs
by: Kabir, Ehsan, et al.
Published: (2024) -
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs
by: Kabir, Ehsan, et al.
Published: (2024) -
ProTEA: Programmable Transformer Encoder Acceleration on FPGA
by: Kabir, Ehsan, et al.
Published: (2024) -
N-TORC: Native Tensor Optimizer for Real-time Constraints
by: Singh, Suyash Vardhan, et al.
Published: (2025)