:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khabbazan, Bahareh, Riera, Marc, González, Antonio
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2502.02142
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM
by: Hong, Junguk, et al.
Published: (2026)

ARAS: An Adaptive Low-Cost ReRAM-Based Accelerator for DNNs
by: Sabri, Mohammad, et al.
Published: (2024)

SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation
by: Han, Wontak, et al.
Published: (2024)

Hamun: An Approximate Computation Method to Prolong the Lifespan of ReRAM-Based Accelerators
by: Sabri, Mohammad, et al.
Published: (2025)

PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
by: Andronic, Marta, et al.
Published: (2023)

CD-PIM: A High-Bandwidth and Compute-Efficient LPDDR5-Based PIM for Low-Batch LLM Acceleration on Edge-Device
by: Lin, Ye, et al.
Published: (2026)

CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
by: Ahn, Bas, et al.
Published: (2026)

THERMOS: Thermally-Aware Multi-Objective Scheduling of AI Workloads on Heterogeneous Multi-Chiplet PIM Architectures
by: Kanani, Alish, et al.
Published: (2025)

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System
by: Kim, Youngsuk, et al.
Published: (2024)

Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2024)

A Survey on LUT-based Deep Neural Networks Implemented in FPGAs
by: Guo, Zeyu
Published: (2025)

Power-Area Efficient Serial IMPLY-based 4:2 Compressor Applied in Data-Intensive Applications
by: Bagheralmoosavi, Bahareh, et al.
Published: (2024)

The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators
by: Kabir, MD Arafat, et al.
Published: (2024)

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
by: Hyun, Bongjoon, et al.
Published: (2023)

Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads
by: Popovici, Doru Thom, et al.
Published: (2025)

Annotated PIM Bibliography
by: Kogge, Peter M.
Published: (2026)

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)

HH-PIM: Dynamic Optimization of Power and Performance with Heterogeneous-Hybrid PIM for Edge AI Devices
by: Jeon, Sangmin, et al.
Published: (2025)

SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity
by: Li, Xingyang, et al.
Published: (2025)

Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads
by: Sharma, Harsh, et al.
Published: (2024)

Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication
by: Shan, Haoxuan, et al.
Published: (2025)

Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
by: Alsop, Johnathan, et al.
Published: (2023)

TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge
by: Huang, Zhirui, et al.
Published: (2025)

PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs
by: Lou, Binglei, et al.
Published: (2024)

LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism
by: Wang, Yimin, et al.
Published: (2025)

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
by: He, Zifan, et al.
Published: (2025)

IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)

DEER: Deep Runahead for Instruction Prefetching on Modern Mobile Workloads
by: Vahdatniya, Parmida, et al.
Published: (2025)

Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2025)

PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs
by: Malekar, Jinendra, et al.
Published: (2025)

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
by: Li, Guoyu, et al.
Published: (2025)

Workload-Aware Early-Stage Power Delivery Network Optimization via Architectural Power Traces
by: Hayes, Oran, et al.
Published: (2026)

DSLUT: An Asymmetric LUT and its Automatic Design Flow Based on Practical Functions
by: Yang, Moucheng, et al.
Published: (2025)

Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage
by: Pun, Junius, et al.
Published: (2025)

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers
by: Wu, Yuting, et al.
Published: (2023)

WaSP: Warp Scheduling to Mimic Prefetching in Graphics Workloads
by: Joseph, Diya, et al.
Published: (2024)

TROOP: At-the-Roofline Performance for Vector Processors on Low Operational Intensity Workloads
by: Purayil, Navaneeth Kunhi, et al.
Published: (2025)

Control Flow Management in Modern GPUs
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)

HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference
by: Sun, Chang, et al.
Published: (2026)