:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Graziano, Marco
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture Artificial Intelligence Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2603.10030
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication
by: Pati, Suchita, et al.
Published: (2025)

Intent-Driven Storage Systems: From Low-Level Tuning to High-Level Understanding
by: Bergman, Shai, et al.
Published: (2025)

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management
by: Zhou, Zhongchun, et al.
Published: (2025)

Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement
by: Deng, Yunhao, et al.
Published: (2025)

Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework
by: Stojkovic, Jovan, et al.
Published: (2025)

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
by: Li, Jonathan, et al.
Published: (2025)

Improving AI Efficiency in Data Centres by Power Dynamic Response
by: Marinoni, Andrea, et al.
Published: (2025)

Optimizing ML Concurrent Computation and Communication with GPU DMA Engines
by: Agrawal, Anirudha, et al.
Published: (2024)

Exploring energy consumption of AI frameworks on a 64-core RV64 Server CPU
by: Malenza, Giulio, et al.
Published: (2025)

IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment
by: Psistakis, Antonis
Published: (2025)

XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs
by: Kong, Fanchen, et al.
Published: (2025)

Power Stabilization for AI Training Datacenters
by: Choukse, Esha, et al.
Published: (2025)

Strict Partitioning for Sporadic Rigid Gang Tasks
by: Sun, Binqi, et al.
Published: (2024)

Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference
by: Zhao, Yiren, et al.
Published: (2026)

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
by: Zhao, Dan, et al.
Published: (2024)

Debunking the CUDA Myth Towards GPU-based AI Systems
by: Lee, Yunjae, et al.
Published: (2024)

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
by: Stojkovic, Jovan, et al.
Published: (2024)

Modernizing Amdahl's Law: How AI Scaling Laws Shape Computer Architecture
by: Lu, Chien-Ping
Published: (2026)

Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator
by: Peccia, Federico Nicolas, et al.
Published: (2024)

FengHuang: Next-Generation Memory Orchestration for AI Inferencing
by: Li, Jiamin, et al.
Published: (2025)

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
by: Tang, Xinsheng, et al.
Published: (2026)

Good things come in small packages: Should we build AI clusters with Lite-GPUs?
by: Canakci, Burcu, et al.
Published: (2025)

TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading
by: Pan, Yudong, et al.
Published: (2026)

COMET: Neural Cost Model Explanation Framework
by: Chaudhary, Isha, et al.
Published: (2023)

Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications
by: Li, Jiaxi, et al.
Published: (2025)

DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics
by: Cao, Yingqi, et al.
Published: (2024)

Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception
by: Odema, Mohanad, et al.
Published: (2024)

CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
by: Garg, Raveesh, et al.
Published: (2023)

Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge
by: Chen, Jiesong, et al.
Published: (2026)

HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode Architectures
by: Liu, Fangxin, et al.
Published: (2026)

ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation
by: Dorairaj, Nij, et al.
Published: (2026)

NPU Design for Diffusion Language Model Inference
by: Lou, Binglei, et al.
Published: (2026)

Forge-UGC: FX optimization and register-graph engine for universal graph compiler
by: Kumar, Satyam, et al.
Published: (2026)

PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators
by: Taheri, Mahdi
Published: (2026)

A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference
by: DeBole, Michael V., et al.
Published: (2025)

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
by: Qin, Ruoyu, et al.
Published: (2024)

PiKV: KV Cache Management System for Mixture of Experts
by: Liu, Dong, et al.
Published: (2025)

Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator
by: Hokenmaier, W, et al.
Published: (2024)

PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving
by: Yüzügüler, Ahmet Caner, et al.
Published: (2025)

Investigating Memory Failure Prediction Across CPU Architectures
by: Yu, Qiao, et al.
Published: (2024)