:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Hannah, Kim, Sohyeon, Kim, Saeyeon, Lee, Jiyoung, Roh, Huijin, Kim, Ji-Hoon
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2502.17729
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models
by: Yang, Jinho, et al.
Published: (2025)

SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference
by: Choi, Yuseon, et al.
Published: (2025)

Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
by: Choi, Dawon, et al.
Published: (2026)

SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation
by: Han, Wontak, et al.
Published: (2024)

IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)

Hardware-based Heterogeneous Memory Management for Large Language Model Inference
by: Hwang, Soojin, et al.
Published: (2025)

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)

LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference
by: Moon, Seungjae, et al.
Published: (2024)

AnalogToBi: Device-Level Analog Circuit Topology Generation via Bipartite Graph and Grammar Guided Decoding
by: Kim, Seungmin, et al.
Published: (2026)

RoMe: Row Granularity Access Memory System for Large Language Models
by: Nam, Hwayong, et al.
Published: (2025)

Cerberus: Cross-Layer ECC Co-Design for Robust and Efficient Memory Protection
by: Kim, Junhwan, et al.
Published: (2026)

Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025)

Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
by: Kim, Dowon, et al.
Published: (2025)

MASQ: Accelerating Masked Diffusion via Stage-Wise Multi-Precision Quantization
by: Kim, Seeyeon, et al.
Published: (2026)

Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
by: Hyun, Bongjoon, et al.
Published: (2023)

Full System Architecture Modeling for Wearable Egocentric Contextual AI
by: Lee, Vincent T., et al.
Published: (2025)

CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
by: Gouk, Donghyun, et al.
Published: (2025)

PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2024)

Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
by: Song, Chang Eun, et al.
Published: (2025)

Modeling and Simulation Frameworks for Processing-in-Memory Architectures
by: Aghaei, Mahdi, et al.
Published: (2025)

RED: Energy Optimization Framework for eDRAM-based PIM with Reconfigurable Voltage Swing and Retention-aware Scheduling
by: Kim, Jae-Young, et al.
Published: (2025)

Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)

MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness
by: Wang, Huizheng, et al.
Published: (2025)

RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping
by: Baik, Jangho, et al.
Published: (2026)

PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)

Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
by: Hong, Jeongmin, et al.
Published: (2024)

Token-Picker: Accelerating Attention in Text Generation with Minimized Memory Transfer via Probability Estimation
by: Park, Junyoung, et al.
Published: (2024)

STAR: Improving Lifetime and Performance of High-Capacity Modern SSDs Using State-Aware Randomizer
by: Kwon, Omin, et al.
Published: (2025)

IBEX: Internal Bandwidth-Efficient Compression Architecture for Scalable CXL Memory Expansion
by: Ko, Younghoon, et al.
Published: (2026)

MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
by: Kang, Do Yeong, et al.
Published: (2025)

Optimizing and Exploring System Performance in Compact Processing-in-Memory-based Chips
by: Chen, Peilin, et al.
Published: (2025)

Configurable Multi-Port Memory Architecture for High-Speed Data Communication
by: Dhakad, Narendra Singh, et al.
Published: (2024)

EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
by: Heo, Jaehoon, et al.
Published: (2025)

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality
by: Shin, Hery, et al.
Published: (2024)

STRAW: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs
by: Chun, Myoungjun, et al.
Published: (2025)

Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search
by: Ko, Seoyoung, et al.
Published: (2025)

Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)

SoK: Systematizing a Decade of Architectural RowHammer Defenses Through the Lens of Streaming Algorithms
by: Kim, Michael Jaemin, et al.
Published: (2025)

A Node-Based Polar List Decoder with Frame Interleaving and Ensemble Decoding Support
by: Ren, Yuqing, et al.
Published: (2024)

LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM
by: Hong, Junguk, et al.
Published: (2026)