:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Fangxin, Zhang, Qinghua, Shen, Hanjing, Liang, Zhibo, Jiang, Li, Guan, Haibing, Bao, Chong, Jin, Xuefeng
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence Hardware Architecture
Online Access:	https://arxiv.org/abs/2602.00748
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Architecture for RISC-V Vector Memory Access
by: Guan, Hongyi, et al.
Published: (2025)

PIUMA: Programmable Integrated Unified Memory Architecture
by: Aananthakrishnan, Sriram, et al.
Published: (2020)

Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025)

Optimizing Offload Performance in Heterogeneous MPSoCs
by: Colagrande, Luca, et al.
Published: (2024)

Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
by: Meng, William, et al.
Published: (2025)

New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
by: Oliveira, Geraldo F.
Published: (2025)

Knowledge-Guided Attention-Inspired Learning for Task Offloading in Vehicle Edge Computing
by: Ma, Ke, et al.
Published: (2025)

DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication
by: Pati, Suchita, et al.
Published: (2025)

Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
by: Chu, Xiaoyu, et al.
Published: (2024)

Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
by: Jarmusch, Aaron, et al.
Published: (2026)

Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
by: Colagrande, Luca, et al.
Published: (2025)

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
by: Wang, Zeke, et al.
Published: (2025)

RailX: A Flexible, Scalable, and Low-Cost Network Architecture for Hyper-Scale LLM Training Systems
by: Feng, Yinxiao, et al.
Published: (2025)

Memory-Centric Computing: Solving Computing's Memory Problem
by: Mutlu, Onur, et al.
Published: (2025)

Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes
by: McDaniel, Adam, et al.
Published: (2026)

cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications
by: Wang, Xi, et al.
Published: (2025)

Analyzing a Two-Tier Disaggregated Memory Protection Scheme Based on Memory Replication
by: Volos, Haris, et al.
Published: (2025)

General-Purpose Multicore Architectures
by: Ghose, Saugata
Published: (2024)

Dynamic Simultaneous Multithreaded Architecture
by: Ortiz-Arroyo, Daniel, et al.
Published: (2024)

A Modern Primer on Processing in Memory
by: Mutlu, Onur, et al.
Published: (2020)

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
by: Wang, Haoxuan, et al.
Published: (2026)

SpArch: Efficient Architecture for Sparse Matrix Multiplication
by: Zhang, Zhekai, et al.
Published: (2020)

Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)

Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)

Memory-Centric Computing: Recent Advances in Processing-in-DRAM
by: Mutlu, Onur, et al.
Published: (2024)

HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads
by: Garg, Raveesh, et al.
Published: (2025)

MOFCO: Mobility- and Migration-Aware Task Offloading in Three-Layer Fog Computing Environments
by: Mahdizadeh, Soheil, et al.
Published: (2025)

Navigating the Landscape of Distributed File Systems: Architectures, Implementations, and Considerations
by: Pan, Xueting, et al.
Published: (2024)

FengHuang: Next-Generation Memory Orchestration for AI Inferencing
by: Li, Jiamin, et al.
Published: (2025)

Handling of Memory Page Faults during Virtual-Address RDMA
by: Psistakis, Antonis
Published: (2025)

UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing
by: Ran, Zhuoheng, et al.
Published: (2025)

DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications
by: Orenes-Vera, Marcelo, et al.
Published: (2023)

iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience
by: Muntaka, Siddique Abubakr, et al.
Published: (2026)

A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
by: Sharma, Harsh, et al.
Published: (2023)

Pooling Engram Conditional Memory in Large Language Models using CXL
by: Ma, Ruiyang, et al.
Published: (2026)

Investigating Memory Failure Prediction Across CPU Architectures
by: Yu, Qiao, et al.
Published: (2024)

TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
by: Zhang, Yichao, et al.
Published: (2026)

How Fast Can Graph Computations Go on Fine-grained Parallel Architectures
by: Wang, Yuqing, et al.
Published: (2025)

BlockAMC: Scalable In-Memory Analog Matrix Computing for Solving Linear Systems
by: Pan, Lunshuai, et al.
Published: (2024)

Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters
by: Wang, Jing, et al.
Published: (2025)