Saved in:
| Main Authors: | Khabbazan, Bahareh, Riera, Marc, González, Antonio |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02142 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM
by: Hong, Junguk, et al.
Published: (2026)
by: Hong, Junguk, et al.
Published: (2026)
ARAS: An Adaptive Low-Cost ReRAM-Based Accelerator for DNNs
by: Sabri, Mohammad, et al.
Published: (2024)
by: Sabri, Mohammad, et al.
Published: (2024)
SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation
by: Han, Wontak, et al.
Published: (2024)
by: Han, Wontak, et al.
Published: (2024)
Hamun: An Approximate Computation Method to Prolong the Lifespan of ReRAM-Based Accelerators
by: Sabri, Mohammad, et al.
Published: (2025)
by: Sabri, Mohammad, et al.
Published: (2025)
PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)
by: Lee, Dongjae, et al.
Published: (2025)
PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
by: Andronic, Marta, et al.
Published: (2023)
by: Andronic, Marta, et al.
Published: (2023)
CD-PIM: A High-Bandwidth and Compute-Efficient LPDDR5-Based PIM for Low-Batch LLM Acceleration on Edge-Device
by: Lin, Ye, et al.
Published: (2026)
by: Lin, Ye, et al.
Published: (2026)
CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
by: Ahn, Bas, et al.
Published: (2026)
by: Ahn, Bas, et al.
Published: (2026)
THERMOS: Thermally-Aware Multi-Objective Scheduling of AI Workloads on Heterogeneous Multi-Chiplet PIM Architectures
by: Kanani, Alish, et al.
Published: (2025)
by: Kanani, Alish, et al.
Published: (2025)
ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System
by: Kim, Youngsuk, et al.
Published: (2024)
by: Kim, Youngsuk, et al.
Published: (2024)
Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2024)
by: Duan, Cenlin, et al.
Published: (2024)
A Survey on LUT-based Deep Neural Networks Implemented in FPGAs
by: Guo, Zeyu
Published: (2025)
by: Guo, Zeyu
Published: (2025)
Power-Area Efficient Serial IMPLY-based 4:2 Compressor Applied in Data-Intensive Applications
by: Bagheralmoosavi, Bahareh, et al.
Published: (2024)
by: Bagheralmoosavi, Bahareh, et al.
Published: (2024)
The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators
by: Kabir, MD Arafat, et al.
Published: (2024)
by: Kabir, MD Arafat, et al.
Published: (2024)
Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
by: Hyun, Bongjoon, et al.
Published: (2023)
by: Hyun, Bongjoon, et al.
Published: (2023)
Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads
by: Popovici, Doru Thom, et al.
Published: (2025)
by: Popovici, Doru Thom, et al.
Published: (2025)
Annotated PIM Bibliography
by: Kogge, Peter M.
Published: (2026)
by: Kogge, Peter M.
Published: (2026)
PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)
by: Lee, Dongjae, et al.
Published: (2024)
HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices
by: Jeon, Sangmin, et al.
Published: (2025)
by: Jeon, Sangmin, et al.
Published: (2025)
SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity
by: Li, Xingyang, et al.
Published: (2025)
by: Li, Xingyang, et al.
Published: (2025)
Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads
by: Sharma, Harsh, et al.
Published: (2024)
by: Sharma, Harsh, et al.
Published: (2024)
Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication
by: Shan, Haoxuan, et al.
Published: (2025)
by: Shan, Haoxuan, et al.
Published: (2025)
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
by: Alsop, Johnathan, et al.
Published: (2023)
by: Alsop, Johnathan, et al.
Published: (2023)
TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge
by: Huang, Zhirui, et al.
Published: (2025)
by: Huang, Zhirui, et al.
Published: (2025)
PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs
by: Lou, Binglei, et al.
Published: (2024)
by: Lou, Binglei, et al.
Published: (2024)
LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism
by: Wang, Yimin, et al.
Published: (2025)
by: Wang, Yimin, et al.
Published: (2025)
LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
by: He, Zifan, et al.
Published: (2025)
by: He, Zifan, et al.
Published: (2025)
IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)
by: Seo, Minseok, et al.
Published: (2024)
DEER: Deep Runahead for Instruction Prefetching on Modern Mobile Workloads
by: Vahdatniya, Parmida, et al.
Published: (2025)
by: Vahdatniya, Parmida, et al.
Published: (2025)
Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs
by: Malekar, Jinendra, et al.
Published: (2025)
by: Malekar, Jinendra, et al.
Published: (2025)
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
by: Li, Guoyu, et al.
Published: (2025)
by: Li, Guoyu, et al.
Published: (2025)
Workload-Aware Early-Stage Power Delivery Network Optimization via Architectural Power Traces
by: Hayes, Oran, et al.
Published: (2026)
by: Hayes, Oran, et al.
Published: (2026)
DSLUT: An Asymmetric LUT and its Automatic Design Flow Based on Practical Functions
by: Yang, Moucheng, et al.
Published: (2025)
by: Yang, Moucheng, et al.
Published: (2025)
Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage
by: Pun, Junius, et al.
Published: (2025)
by: Pun, Junius, et al.
Published: (2025)
PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers
by: Wu, Yuting, et al.
Published: (2023)
by: Wu, Yuting, et al.
Published: (2023)
WaSP: Warp Scheduling to Mimic Prefetching in Graphics Workloads
by: Joseph, Diya, et al.
Published: (2024)
by: Joseph, Diya, et al.
Published: (2024)
TROOP: At-the-Roofline Performance for Vector Processors on Low Operational Intensity Workloads
by: Purayil, Navaneeth Kunhi, et al.
Published: (2025)
by: Purayil, Navaneeth Kunhi, et al.
Published: (2025)
Control Flow Management in Modern GPUs
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)
HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference
by: Sun, Chang, et al.
Published: (2026)
by: Sun, Chang, et al.
Published: (2026)
Similar Items
-
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM
by: Hong, Junguk, et al.
Published: (2026) -
ARAS: An Adaptive Low-Cost ReRAM-Based Accelerator for DNNs
by: Sabri, Mohammad, et al.
Published: (2024) -
SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation
by: Han, Wontak, et al.
Published: (2024) -
Hamun: An Approximate Computation Method to Prolong the Lifespan of ReRAM-Based Accelerators
by: Sabri, Mohammad, et al.
Published: (2025) -
PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)