Saved in:
| Main Authors: | Liu, Fangxin, Zhang, Qinghua, Shen, Hanjing, Liang, Zhibo, Jiang, Li, Guan, Haibing, Bao, Chong, Jin, Xuefeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00748 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Architecture for RISC-V Vector Memory Access
by: Guan, Hongyi, et al.
Published: (2025)
by: Guan, Hongyi, et al.
Published: (2025)
PIUMA: Programmable Integrated Unified Memory Architecture
by: Aananthakrishnan, Sriram, et al.
Published: (2020)
by: Aananthakrishnan, Sriram, et al.
Published: (2020)
Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025)
by: Scionti, Alberto, et al.
Published: (2025)
Optimizing Offload Performance in Heterogeneous MPSoCs
by: Colagrande, Luca, et al.
Published: (2024)
by: Colagrande, Luca, et al.
Published: (2024)
Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
by: Meng, William, et al.
Published: (2025)
by: Meng, William, et al.
Published: (2025)
New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
by: Oliveira, Geraldo F.
Published: (2025)
by: Oliveira, Geraldo F.
Published: (2025)
Knowledge-Guided Attention-Inspired Learning for Task Offloading in Vehicle Edge Computing
by: Ma, Ke, et al.
Published: (2025)
by: Ma, Ke, et al.
Published: (2025)
DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication
by: Pati, Suchita, et al.
Published: (2025)
by: Pati, Suchita, et al.
Published: (2025)
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
by: Chu, Xiaoyu, et al.
Published: (2024)
by: Chu, Xiaoyu, et al.
Published: (2024)
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
by: Jarmusch, Aaron, et al.
Published: (2026)
by: Jarmusch, Aaron, et al.
Published: (2026)
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
by: Colagrande, Luca, et al.
Published: (2025)
by: Colagrande, Luca, et al.
Published: (2025)
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
by: Wang, Zeke, et al.
Published: (2025)
by: Wang, Zeke, et al.
Published: (2025)
RailX: A Flexible, Scalable, and Low-Cost Network Architecture for Hyper-Scale LLM Training Systems
by: Feng, Yinxiao, et al.
Published: (2025)
by: Feng, Yinxiao, et al.
Published: (2025)
Memory-Centric Computing: Solving Computing's Memory Problem
by: Mutlu, Onur, et al.
Published: (2025)
by: Mutlu, Onur, et al.
Published: (2025)
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes
by: McDaniel, Adam, et al.
Published: (2026)
by: McDaniel, Adam, et al.
Published: (2026)
cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications
by: Wang, Xi, et al.
Published: (2025)
by: Wang, Xi, et al.
Published: (2025)
Analyzing a Two-Tier Disaggregated Memory Protection Scheme Based on Memory Replication
by: Volos, Haris, et al.
Published: (2025)
by: Volos, Haris, et al.
Published: (2025)
General-Purpose Multicore Architectures
by: Ghose, Saugata
Published: (2024)
by: Ghose, Saugata
Published: (2024)
Dynamic Simultaneous Multithreaded Architecture
by: Ortiz-Arroyo, Daniel, et al.
Published: (2024)
by: Ortiz-Arroyo, Daniel, et al.
Published: (2024)
A Modern Primer on Processing in Memory
by: Mutlu, Onur, et al.
Published: (2020)
by: Mutlu, Onur, et al.
Published: (2020)
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
by: Wang, Haoxuan, et al.
Published: (2026)
by: Wang, Haoxuan, et al.
Published: (2026)
SpArch: Efficient Architecture for Sparse Matrix Multiplication
by: Zhang, Zhekai, et al.
Published: (2020)
by: Zhang, Zhekai, et al.
Published: (2020)
Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)
by: Asquini, Lorenzo, et al.
Published: (2025)
Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
by: Mutlu, Onur, et al.
Published: (2024)
by: Mutlu, Onur, et al.
Published: (2024)
HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads
by: Garg, Raveesh, et al.
Published: (2025)
by: Garg, Raveesh, et al.
Published: (2025)
MOFCO: Mobility- and Migration-Aware Task Offloading in Three-Layer Fog Computing Environments
by: Mahdizadeh, Soheil, et al.
Published: (2025)
by: Mahdizadeh, Soheil, et al.
Published: (2025)
Navigating the Landscape of Distributed File Systems: Architectures, Implementations, and Considerations
by: Pan, Xueting, et al.
Published: (2024)
by: Pan, Xueting, et al.
Published: (2024)
FengHuang: Next-Generation Memory Orchestration for AI Inferencing
by: Li, Jiamin, et al.
Published: (2025)
by: Li, Jiamin, et al.
Published: (2025)
Handling of Memory Page Faults during Virtual-Address RDMA
by: Psistakis, Antonis
Published: (2025)
by: Psistakis, Antonis
Published: (2025)
UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing
by: Ran, Zhuoheng, et al.
Published: (2025)
by: Ran, Zhuoheng, et al.
Published: (2025)
DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications
by: Orenes-Vera, Marcelo, et al.
Published: (2023)
by: Orenes-Vera, Marcelo, et al.
Published: (2023)
iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience
by: Muntaka, Siddique Abubakr, et al.
Published: (2026)
by: Muntaka, Siddique Abubakr, et al.
Published: (2026)
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
by: Sharma, Harsh, et al.
Published: (2023)
by: Sharma, Harsh, et al.
Published: (2023)
Pooling Engram Conditional Memory in Large Language Models using CXL
by: Ma, Ruiyang, et al.
Published: (2026)
by: Ma, Ruiyang, et al.
Published: (2026)
Investigating Memory Failure Prediction Across CPU Architectures
by: Yu, Qiao, et al.
Published: (2024)
by: Yu, Qiao, et al.
Published: (2024)
TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
by: Zhang, Yichao, et al.
Published: (2026)
by: Zhang, Yichao, et al.
Published: (2026)
How Fast Can Graph Computations Go on Fine-grained Parallel Architectures
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
BlockAMC: Scalable In-Memory Analog Matrix Computing for Solving Linear Systems
by: Pan, Lunshuai, et al.
Published: (2024)
by: Pan, Lunshuai, et al.
Published: (2024)
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters
by: Wang, Jing, et al.
Published: (2025)
by: Wang, Jing, et al.
Published: (2025)
Similar Items
-
Efficient Architecture for RISC-V Vector Memory Access
by: Guan, Hongyi, et al.
Published: (2025) -
PIUMA: Programmable Integrated Unified Memory Architecture
by: Aananthakrishnan, Sriram, et al.
Published: (2020) -
Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025) -
Optimizing Offload Performance in Heterogeneous MPSoCs
by: Colagrande, Luca, et al.
Published: (2024) -
Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
by: Meng, William, et al.
Published: (2025)