Saved in:
| Main Authors: | Yang, Hannah, Kim, Sohyeon, Kim, Saeyeon, Lee, Jiyoung, Roh, Huijin, Kim, Ji-Hoon |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.17729 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models
by: Yang, Jinho, et al.
Published: (2025)
by: Yang, Jinho, et al.
Published: (2025)
SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference
by: Choi, Yuseon, et al.
Published: (2025)
by: Choi, Yuseon, et al.
Published: (2025)
Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
by: Choi, Dawon, et al.
Published: (2026)
by: Choi, Dawon, et al.
Published: (2026)
SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation
by: Han, Wontak, et al.
Published: (2024)
by: Han, Wontak, et al.
Published: (2024)
IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)
by: Seo, Minseok, et al.
Published: (2024)
Hardware-based Heterogeneous Memory Management for Large Language Model Inference
by: Hwang, Soojin, et al.
Published: (2025)
by: Hwang, Soojin, et al.
Published: (2025)
PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)
by: Lee, Dongjae, et al.
Published: (2024)
LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference
by: Moon, Seungjae, et al.
Published: (2024)
by: Moon, Seungjae, et al.
Published: (2024)
AnalogToBi: Device-Level Analog Circuit Topology Generation via Bipartite Graph and Grammar Guided Decoding
by: Kim, Seungmin, et al.
Published: (2026)
by: Kim, Seungmin, et al.
Published: (2026)
RoMe: Row Granularity Access Memory System for Large Language Models
by: Nam, Hwayong, et al.
Published: (2025)
by: Nam, Hwayong, et al.
Published: (2025)
Cerberus: Cross-Layer ECC Co-Design for Robust and Efficient Memory Protection
by: Kim, Junhwan, et al.
Published: (2026)
by: Kim, Junhwan, et al.
Published: (2026)
Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025)
by: Noh, Seock-Hwan, et al.
Published: (2025)
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
by: Kim, Dowon, et al.
Published: (2025)
by: Kim, Dowon, et al.
Published: (2025)
MASQ: Accelerating Masked Diffusion via Stage-Wise Multi-Precision Quantization
by: Kim, Seeyeon, et al.
Published: (2026)
by: Kim, Seeyeon, et al.
Published: (2026)
Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
by: Hyun, Bongjoon, et al.
Published: (2023)
by: Hyun, Bongjoon, et al.
Published: (2023)
Full System Architecture Modeling for Wearable Egocentric Contextual AI
by: Lee, Vincent T., et al.
Published: (2025)
by: Lee, Vincent T., et al.
Published: (2025)
CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
by: Gouk, Donghyun, et al.
Published: (2025)
by: Gouk, Donghyun, et al.
Published: (2025)
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2024)
by: Oliveira, Geraldo F., et al.
Published: (2024)
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
by: Song, Chang Eun, et al.
Published: (2025)
by: Song, Chang Eun, et al.
Published: (2025)
Modeling and Simulation Frameworks for Processing-in-Memory Architectures
by: Aghaei, Mahdi, et al.
Published: (2025)
by: Aghaei, Mahdi, et al.
Published: (2025)
RED: Energy Optimization Framework for eDRAM-based PIM with Reconfigurable Voltage Swing and Retention-aware Scheduling
by: Kim, Jae-Young, et al.
Published: (2025)
by: Kim, Jae-Young, et al.
Published: (2025)
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)
by: Kim, Dong Eun, et al.
Published: (2025)
MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness
by: Wang, Huizheng, et al.
Published: (2025)
by: Wang, Huizheng, et al.
Published: (2025)
RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping
by: Baik, Jangho, et al.
Published: (2026)
by: Baik, Jangho, et al.
Published: (2026)
PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)
by: Lee, Dongjae, et al.
Published: (2025)
Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
by: Hong, Jeongmin, et al.
Published: (2024)
by: Hong, Jeongmin, et al.
Published: (2024)
Token-Picker: Accelerating Attention in Text Generation with Minimized Memory Transfer via Probability Estimation
by: Park, Junyoung, et al.
Published: (2024)
by: Park, Junyoung, et al.
Published: (2024)
STAR: Improving Lifetime and Performance of High-Capacity Modern SSDs Using State-Aware Randomizer
by: Kwon, Omin, et al.
Published: (2025)
by: Kwon, Omin, et al.
Published: (2025)
IBEX: Internal Bandwidth-Efficient Compression Architecture for Scalable CXL Memory Expansion
by: Ko, Younghoon, et al.
Published: (2026)
by: Ko, Younghoon, et al.
Published: (2026)
MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
by: Kang, Do Yeong, et al.
Published: (2025)
by: Kang, Do Yeong, et al.
Published: (2025)
Optimizing and Exploring System Performance in Compact Processing-in-Memory-based Chips
by: Chen, Peilin, et al.
Published: (2025)
by: Chen, Peilin, et al.
Published: (2025)
Configurable Multi-Port Memory Architecture for High-Speed Data Communication
by: Dhakad, Narendra Singh, et al.
Published: (2024)
by: Dhakad, Narendra Singh, et al.
Published: (2024)
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
by: Heo, Jaehoon, et al.
Published: (2025)
by: Heo, Jaehoon, et al.
Published: (2025)
HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality
by: Shin, Hery, et al.
Published: (2024)
by: Shin, Hery, et al.
Published: (2024)
STRAW: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs
by: Chun, Myoungjun, et al.
Published: (2025)
by: Chun, Myoungjun, et al.
Published: (2025)
Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search
by: Ko, Seoyoung, et al.
Published: (2025)
by: Ko, Seoyoung, et al.
Published: (2025)
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)
by: Li, Huize, et al.
Published: (2026)
SoK: Systematizing a Decade of Architectural RowHammer Defenses Through the Lens of Streaming Algorithms
by: Kim, Michael Jaemin, et al.
Published: (2025)
by: Kim, Michael Jaemin, et al.
Published: (2025)
A Node-Based Polar List Decoder with Frame Interleaving and Ensemble Decoding Support
by: Ren, Yuqing, et al.
Published: (2024)
by: Ren, Yuqing, et al.
Published: (2024)
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM
by: Hong, Junguk, et al.
Published: (2026)
by: Hong, Junguk, et al.
Published: (2026)
Similar Items
-
SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models
by: Yang, Jinho, et al.
Published: (2025) -
SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference
by: Choi, Yuseon, et al.
Published: (2025) -
Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
by: Choi, Dawon, et al.
Published: (2026) -
SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation
by: Han, Wontak, et al.
Published: (2024) -
IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)