Saved in:
| Main Authors: | Shin, Injae, Tine, Blaise |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.17602 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores
by: Rout, Nikhil, et al.
Published: (2025)
by: Rout, Nikhil, et al.
Published: (2025)
Inside VOLT: Designing an Open-Source GPU Compiler
by: Jeong, Shinnung, et al.
Published: (2025)
by: Jeong, Shinnung, et al.
Published: (2025)
Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU
by: Pu, Huanzhi, et al.
Published: (2025)
by: Pu, Huanzhi, et al.
Published: (2025)
CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
by: Gouk, Donghyun, et al.
Published: (2025)
by: Gouk, Donghyun, et al.
Published: (2025)
CMD: A Cache-assisted GPU Memory Deduplication Architecture
by: Zhao, Wei, et al.
Published: (2024)
by: Zhao, Wei, et al.
Published: (2024)
Five-Minute Rule 40 Years Later: A First-Principles Revisit for Modern Memory Hierarchy
by: Zhang, Tong, et al.
Published: (2025)
by: Zhang, Tong, et al.
Published: (2025)
Apparate: Evading Memory Hierarchy with GodSpeed Wireless-on-Chip
by: GS, Nitesh Narayana, et al.
Published: (2024)
by: GS, Nitesh Narayana, et al.
Published: (2024)
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2024)
by: Oliveira, Geraldo F., et al.
Published: (2024)
e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications
by: Machetti, Simone, et al.
Published: (2025)
by: Machetti, Simone, et al.
Published: (2025)
Theodosian: A Deep Dive into Memory-Hierarchy-Centric FHE Acceleration
by: Choi, Wonseok, et al.
Published: (2025)
by: Choi, Wonseok, et al.
Published: (2025)
A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator
by: Bause, Oliver, et al.
Published: (2024)
by: Bause, Oliver, et al.
Published: (2024)
OpenGL GPU-Based Rowhammer Attack (Work in Progress)
by: Plin, Antoine, et al.
Published: (2025)
by: Plin, Antoine, et al.
Published: (2025)
Optimized Memory System Architecture for VESA VDC-M Decoder with Multi-Slice Support
by: Yang, Hannah, et al.
Published: (2025)
by: Yang, Hannah, et al.
Published: (2025)
RoboGPU: Accelerating GPU Collision Detection for Robotics
by: Liu, Lufei, et al.
Published: (2026)
by: Liu, Lufei, et al.
Published: (2026)
Analyzing Modern NVIDIA GPU cores
by: Huerta, Rodrigo, et al.
Published: (2025)
by: Huerta, Rodrigo, et al.
Published: (2025)
Memory Hierarchy Design for Caching Middleware in the Age of NVM
by: Ghandeharizadeh, Shahram, et al.
Published: (2025)
by: Ghandeharizadeh, Shahram, et al.
Published: (2025)
RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures
by: Iff, Patrick, et al.
Published: (2023)
by: Iff, Patrick, et al.
Published: (2023)
Design of a GPU with Heterogeneous Cores for Graphics
by: Tomás, Aurora, et al.
Published: (2026)
by: Tomás, Aurora, et al.
Published: (2026)
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
by: Luo, Weile, et al.
Published: (2024)
by: Luo, Weile, et al.
Published: (2024)
COOK Access Control on an embedded Volta GPU
by: Lesage, Benjamin, et al.
Published: (2024)
by: Lesage, Benjamin, et al.
Published: (2024)
täkōFormal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)
by: Srinivasan, Pranav, et al.
Published: (2026)
by: Srinivasan, Pranav, et al.
Published: (2026)
Choreographer: A Full-System Framework for Fine-Grained Tasks in Cache Hierarchies
by: Nguyen, Hoa, et al.
Published: (2025)
by: Nguyen, Hoa, et al.
Published: (2025)
Efficient Open Modification Spectral Library Searching in High-Dimensional Space with Multi-Level-Cell Memory
by: Fan, Keming, et al.
Published: (2024)
by: Fan, Keming, et al.
Published: (2024)
CuLifter: Lifting GPU Binaries to Typed IR
by: Zhao, Jisheng, et al.
Published: (2026)
by: Zhao, Jisheng, et al.
Published: (2026)
All-rounder: A Flexible AI Accelerator with Diverse Data Format Support and Morphable Structure for Multi-DNN Processing
by: Noh, Seock-Hwan, et al.
Published: (2023)
by: Noh, Seock-Hwan, et al.
Published: (2023)
GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment
by: Zhao, Chunyuan, et al.
Published: (2025)
by: Zhao, Chunyuan, et al.
Published: (2025)
Thermal Analysis for NVIDIA GTX480 Fermi GPU Architecture
by: Nagendra, Savinay
Published: (2024)
by: Nagendra, Savinay
Published: (2024)
Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gather
by: Shin, Changmin, et al.
Published: (2025)
by: Shin, Changmin, et al.
Published: (2025)
Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
by: Ham, Hyungkyu, et al.
Published: (2024)
by: Ham, Hyungkyu, et al.
Published: (2024)
Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM
by: Liu, Lian, et al.
Published: (2025)
by: Liu, Lian, et al.
Published: (2025)
The Anatomy of Silent Data Corruption: GPU Error Pattern Study and Modeling Guidance
by: Tung, Chung-Hsuan, et al.
Published: (2026)
by: Tung, Chung-Hsuan, et al.
Published: (2026)
Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node
by: Latif, Imran, et al.
Published: (2024)
by: Latif, Imran, et al.
Published: (2024)
EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads
by: Lee, Kyungmi, et al.
Published: (2026)
by: Lee, Kyungmi, et al.
Published: (2026)
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
by: Gundawar, Ayush, et al.
Published: (2024)
by: Gundawar, Ayush, et al.
Published: (2024)
Edge GPU Aware Multiple AI Model Pipeline for Accelerated MRI Reconstruction and Analysis
by: Majeed, Ashiyana Abdul, et al.
Published: (2025)
by: Majeed, Ashiyana Abdul, et al.
Published: (2025)
GPU-Accelerated Simulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems
by: Gonul, Yilmaz Ege, et al.
Published: (2025)
by: Gonul, Yilmaz Ege, et al.
Published: (2025)
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
by: Guan, Yue, et al.
Published: (2026)
by: Guan, Yue, et al.
Published: (2026)
Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing
by: Shin, Duckgyu, et al.
Published: (2026)
by: Shin, Duckgyu, et al.
Published: (2026)
The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory
by: Volos, Haris
Published: (2023)
by: Volos, Haris
Published: (2023)
IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
by: Seo, Minseok, et al.
Published: (2024)
by: Seo, Minseok, et al.
Published: (2024)
Similar Items
-
Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores
by: Rout, Nikhil, et al.
Published: (2025) -
Inside VOLT: Designing an Open-Source GPU Compiler
by: Jeong, Shinnung, et al.
Published: (2025) -
Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU
by: Pu, Huanzhi, et al.
Published: (2025) -
CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
by: Gouk, Donghyun, et al.
Published: (2025) -
CMD: A Cache-assisted GPU Memory Deduplication Architecture
by: Zhao, Wei, et al.
Published: (2024)