:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Mingxuan, Thottethodi, Mithuna, Vijaykumar, T. N.
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2404.04708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

QED: Scalable Verification of Hardware Memory Consistency
by: Ravi, Gokulan, et al.
Published: (2024)

NetSmith: An Optimization Framework for Machine-Discovered Network Topologies
by: Green, Conor, et al.
Published: (2024)

PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2024)

Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025)

A Composable Dynamic Sparse Dataflow Architecture for Efficient Event-based Vision Processing on FPGA
by: Gao, Yizhao, et al.
Published: (2024)

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption
by: Ding, Lin, et al.
Published: (2024)

In-Memory Computing Architecture for Efficient Hardware Security
by: Ajmi, Hala, et al.
Published: (2024)

XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
by: Yu, Feng, et al.
Published: (2026)

HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)

LP-Spec: Leveraging LPDDR PIM for Efficient LLM Mobile Speculative Inference with Architecture-Dataflow Co-Optimization
by: He, Siyuan, et al.
Published: (2025)

Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference
by: Wang, Xinyu, et al.
Published: (2026)

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture
by: Xuan, Zihao, et al.
Published: (2026)

PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)

A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network
by: Jiang, Aojie, et al.
Published: (2026)

Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)

DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2023)

PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
by: Chong, Yue Jiet, et al.
Published: (2026)

LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference
by: Zheng, Jianing, et al.
Published: (2025)

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs
by: Zhang, Jintao, et al.
Published: (2026)

CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)

Enabling Efficient Transaction Processing on CXL-Based Memory Sharing
by: Wang, Zhao, et al.
Published: (2025)

IBEX: Internal Bandwidth-Efficient Compression Architecture for Scalable CXL Memory Expansion
by: Ko, Younghoon, et al.
Published: (2026)

Modeling and Simulation Frameworks for Processing-in-Memory Architectures
by: Aghaei, Mahdi, et al.
Published: (2025)

GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Conditional Processing
by: Pei, Minnan, et al.
Published: (2025)

Tasa: Thermal-aware 3D-Stacked Architecture Design with Bandwidth Sharing for LLM Inference
by: He, Siyuan, et al.
Published: (2025)

A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents
by: Liao, Zhipeng, et al.
Published: (2025)

ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing
by: You, Kang, et al.
Published: (2026)

TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge
by: Huang, Zhirui, et al.
Published: (2025)

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)

SmaRTLy: RTL Optimization with Logic Inferencing and Structural Rebuilding
by: Li, Chengxi, et al.
Published: (2025)

3D-TrIM: A Memory-Efficient Spatial Computing Architecture for Convolution Workloads
by: Sestito, Cristian, et al.
Published: (2025)

The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference
by: Li, Fang
Published: (2025)

Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
by: Song, Chang Eun, et al.
Published: (2025)

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
by: He, Zifan, et al.
Published: (2025)

Empowering Malware Detection Efficiency within Processing-in-Memory Architecture
by: Kasarapu, Sreenitha, et al.
Published: (2024)

End-to-End Transformer Acceleration Through Processing-in-Memory Architectures
by: Yang, Xiaoxuan, et al.
Published: (2025)

Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
by: Wolters, Christopher, et al.
Published: (2024)

Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design
by: Xie, Rui, et al.
Published: (2025)

Octopus: Enhancing CXL Memory Pods via Sparse Topology
by: Zhong, Yuhong, et al.
Published: (2025)

SpANNS: Optimizing Approximate Nearest Neighbor Search for Sparse Vectors Using Near Memory Processing
by: Zhang, Tianqi, et al.
Published: (2026)