Saved in:
| Main Authors: | Zhang, Tong, Mailthody, Vikram Sharma, Sun, Fei, Ma, Linsen, Newburn, Chris J., Zhang, Teresa, Liu, Yang, Li, Jiangpeng, Zhong, Hao, Hwu, Wen-Mei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.03944 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
by: Park, Jeongmin Brian, et al.
Published: (2023)
by: Park, Jeongmin Brian, et al.
Published: (2023)
Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
Multiport Support for Vortex OpenGPU Memory Hierarchy
by: Shin, Injae, et al.
Published: (2025)
by: Shin, Injae, et al.
Published: (2025)
Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System
by: Fang, Yunhua, et al.
Published: (2025)
by: Fang, Yunhua, et al.
Published: (2025)
TRACE: Unlocking Effective CXL Bandwidth via Lossless Compression and Precision Scaling
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
Making Strong Error-Correcting Codes Work Effectively for HBM in AI Inference
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
Apparate: Evading Memory Hierarchy with GodSpeed Wireless-on-Chip
by: GS, Nitesh Narayana, et al.
Published: (2024)
by: GS, Nitesh Narayana, et al.
Published: (2024)
Theodosian: A Deep Dive into Memory-Hierarchy-Centric FHE Acceleration
by: Choi, Wonseok, et al.
Published: (2025)
by: Choi, Wonseok, et al.
Published: (2025)
A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator
by: Bause, Oliver, et al.
Published: (2024)
by: Bause, Oliver, et al.
Published: (2024)
Memory Hierarchy Design for Caching Middleware in the Age of NVM
by: Ghandeharizadeh, Shahram, et al.
Published: (2025)
by: Ghandeharizadeh, Shahram, et al.
Published: (2025)
täkōFormal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)
by: Srinivasan, Pranav, et al.
Published: (2026)
by: Srinivasan, Pranav, et al.
Published: (2026)
A Modern Primer on Processing in Memory
by: Mutlu, Onur, et al.
Published: (2020)
by: Mutlu, Onur, et al.
Published: (2020)
Revisiting Main Memory-Based Covert and Side Channel Attacks in the Context of Processing-in-Memory
by: Bostanci, F. Nisa, et al.
Published: (2024)
by: Bostanci, F. Nisa, et al.
Published: (2024)
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization
by: Xie, Rui, et al.
Published: (2024)
by: Xie, Rui, et al.
Published: (2024)
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)
by: Kim, Dong Eun, et al.
Published: (2025)
Performance Characterizations and Usage Guidelines of Samsung CXL Memory Module Hybrid Prototype
by: Zeng, Jianping, et al.
Published: (2025)
by: Zeng, Jianping, et al.
Published: (2025)
Descriptor-Based Object-Aware Memory Systems: A Comprehensive Review
by: Tong, Dong
Published: (2025)
by: Tong, Dong
Published: (2025)
CMD: A Cache-assisted GPU Memory Deduplication Architecture
by: Zhao, Wei, et al.
Published: (2024)
by: Zhao, Wei, et al.
Published: (2024)
PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System
by: Liu, Lian, et al.
Published: (2026)
by: Liu, Lian, et al.
Published: (2026)
HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads
by: Negi, Shubham, et al.
Published: (2024)
by: Negi, Shubham, et al.
Published: (2024)
Mainframe-Style Channel Controllers for Modern Disaggregated Memory Systems
by: Liu, Zikai, et al.
Published: (2025)
by: Liu, Zikai, et al.
Published: (2025)
Revisiting VerilogEval: A Year of Improvements in Large-Language Models for Hardware Code Generation
by: Pinckney, Nathaniel, et al.
Published: (2024)
by: Pinckney, Nathaniel, et al.
Published: (2024)
Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access
by: Wang, Luming, et al.
Published: (2024)
by: Wang, Luming, et al.
Published: (2024)
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
by: Yuan, Aojie, et al.
Published: (2026)
by: Yuan, Aojie, et al.
Published: (2026)
Choreographer: A Full-System Framework for Fine-Grained Tasks in Cache Hierarchies
by: Nguyen, Hoa, et al.
Published: (2025)
by: Nguyen, Hoa, et al.
Published: (2025)
Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge
by: Dumoulin, Joren, et al.
Published: (2025)
by: Dumoulin, Joren, et al.
Published: (2025)
CXL-Interference: Analysis and Characterization in Modern Computer Systems
by: Mao, Shunyu, et al.
Published: (2024)
by: Mao, Shunyu, et al.
Published: (2024)
Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing
by: Wang, Chuanzhen, et al.
Published: (2026)
by: Wang, Chuanzhen, et al.
Published: (2026)
Control Flow Management in Modern GPUs
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)
Analyzing Modern NVIDIA GPU cores
by: Huerta, Rodrigo, et al.
Published: (2025)
by: Huerta, Rodrigo, et al.
Published: (2025)
RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
by: Ghiasi, Nika Mansouri, et al.
Published: (2022)
by: Ghiasi, Nika Mansouri, et al.
Published: (2022)
A Review of SRAM-based Compute-in-Memory Circuits
by: Yoshioka, Kentaro, et al.
Published: (2024)
by: Yoshioka, Kentaro, et al.
Published: (2024)
Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References
by: Chen, Hongzheng, et al.
Published: (2025)
by: Chen, Hongzheng, et al.
Published: (2025)
AutoRAC: Automated Processing-in-Memory Accelerator Design for Recommender Systems
by: Cheng, Feng, et al.
Published: (2025)
by: Cheng, Feng, et al.
Published: (2025)
LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer
by: Wang, Jiapin, et al.
Published: (2024)
by: Wang, Jiapin, et al.
Published: (2024)
An Event-Driven Spiking Compute-In-Memory Macro based on SOT-MRAM
by: Yu, Deyang, et al.
Published: (2025)
by: Yu, Deyang, et al.
Published: (2025)
NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
by: Zhou, Zhe, et al.
Published: (2024)
by: Zhou, Zhe, et al.
Published: (2024)
Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression
by: Cheng, Feng, et al.
Published: (2025)
by: Cheng, Feng, et al.
Published: (2025)
No One-Size-Fits-All: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-Memory
by: Zhang, Jingyao, et al.
Published: (2025)
by: Zhang, Jingyao, et al.
Published: (2025)
Similar Items
-
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
by: Park, Jeongmin Brian, et al.
Published: (2023) -
Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design
by: Xie, Rui, et al.
Published: (2025) -
Multiport Support for Vortex OpenGPU Memory Hierarchy
by: Shin, Injae, et al.
Published: (2025) -
Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System
by: Fang, Yunhua, et al.
Published: (2025) -
TRACE: Unlocking Effective CXL Bandwidth via Lossless Compression and Precision Scaling
by: Xie, Rui, et al.
Published: (2025)