Saved in:
| Main Author: | Graziano, Marco |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.10030 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication
by: Pati, Suchita, et al.
Published: (2025)
by: Pati, Suchita, et al.
Published: (2025)
Intent-Driven Storage Systems: From Low-Level Tuning to High-Level Understanding
by: Bergman, Shai, et al.
Published: (2025)
by: Bergman, Shai, et al.
Published: (2025)
DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management
by: Zhou, Zhongchun, et al.
Published: (2025)
by: Zhou, Zhongchun, et al.
Published: (2025)
Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement
by: Deng, Yunhao, et al.
Published: (2025)
by: Deng, Yunhao, et al.
Published: (2025)
Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework
by: Stojkovic, Jovan, et al.
Published: (2025)
by: Stojkovic, Jovan, et al.
Published: (2025)
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
by: Li, Jonathan, et al.
Published: (2025)
by: Li, Jonathan, et al.
Published: (2025)
Improving AI Efficiency in Data Centres by Power Dynamic Response
by: Marinoni, Andrea, et al.
Published: (2025)
by: Marinoni, Andrea, et al.
Published: (2025)
Optimizing ML Concurrent Computation and Communication with GPU DMA Engines
by: Agrawal, Anirudha, et al.
Published: (2024)
by: Agrawal, Anirudha, et al.
Published: (2024)
Exploring energy consumption of AI frameworks on a 64-core RV64 Server CPU
by: Malenza, Giulio, et al.
Published: (2025)
by: Malenza, Giulio, et al.
Published: (2025)
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment
by: Psistakis, Antonis
Published: (2025)
by: Psistakis, Antonis
Published: (2025)
XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs
by: Kong, Fanchen, et al.
Published: (2025)
by: Kong, Fanchen, et al.
Published: (2025)
Power Stabilization for AI Training Datacenters
by: Choukse, Esha, et al.
Published: (2025)
by: Choukse, Esha, et al.
Published: (2025)
Strict Partitioning for Sporadic Rigid Gang Tasks
by: Sun, Binqi, et al.
Published: (2024)
by: Sun, Binqi, et al.
Published: (2024)
Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference
by: Zhao, Yiren, et al.
Published: (2026)
by: Zhao, Yiren, et al.
Published: (2026)
Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
by: Zhao, Dan, et al.
Published: (2024)
by: Zhao, Dan, et al.
Published: (2024)
Debunking the CUDA Myth Towards GPU-based AI Systems
by: Lee, Yunjae, et al.
Published: (2024)
by: Lee, Yunjae, et al.
Published: (2024)
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
by: Stojkovic, Jovan, et al.
Published: (2024)
by: Stojkovic, Jovan, et al.
Published: (2024)
Modernizing Amdahl's Law: How AI Scaling Laws Shape Computer Architecture
by: Lu, Chien-Ping
Published: (2026)
by: Lu, Chien-Ping
Published: (2026)
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator
by: Peccia, Federico Nicolas, et al.
Published: (2024)
by: Peccia, Federico Nicolas, et al.
Published: (2024)
FengHuang: Next-Generation Memory Orchestration for AI Inferencing
by: Li, Jiamin, et al.
Published: (2025)
by: Li, Jiamin, et al.
Published: (2025)
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
by: Tang, Xinsheng, et al.
Published: (2026)
by: Tang, Xinsheng, et al.
Published: (2026)
Good things come in small packages: Should we build AI clusters with Lite-GPUs?
by: Canakci, Burcu, et al.
Published: (2025)
by: Canakci, Burcu, et al.
Published: (2025)
TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading
by: Pan, Yudong, et al.
Published: (2026)
by: Pan, Yudong, et al.
Published: (2026)
COMET: Neural Cost Model Explanation Framework
by: Chaudhary, Isha, et al.
Published: (2023)
by: Chaudhary, Isha, et al.
Published: (2023)
Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications
by: Li, Jiaxi, et al.
Published: (2025)
by: Li, Jiaxi, et al.
Published: (2025)
DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics
by: Cao, Yingqi, et al.
Published: (2024)
by: Cao, Yingqi, et al.
Published: (2024)
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception
by: Odema, Mohanad, et al.
Published: (2024)
by: Odema, Mohanad, et al.
Published: (2024)
CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
by: Garg, Raveesh, et al.
Published: (2023)
by: Garg, Raveesh, et al.
Published: (2023)
Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge
by: Chen, Jiesong, et al.
Published: (2026)
by: Chen, Jiesong, et al.
Published: (2026)
HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode Architectures
by: Liu, Fangxin, et al.
Published: (2026)
by: Liu, Fangxin, et al.
Published: (2026)
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation
by: Dorairaj, Nij, et al.
Published: (2026)
by: Dorairaj, Nij, et al.
Published: (2026)
NPU Design for Diffusion Language Model Inference
by: Lou, Binglei, et al.
Published: (2026)
by: Lou, Binglei, et al.
Published: (2026)
Forge-UGC: FX optimization and register-graph engine for universal graph compiler
by: Kumar, Satyam, et al.
Published: (2026)
by: Kumar, Satyam, et al.
Published: (2026)
PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators
by: Taheri, Mahdi
Published: (2026)
by: Taheri, Mahdi
Published: (2026)
A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference
by: DeBole, Michael V., et al.
Published: (2025)
by: DeBole, Michael V., et al.
Published: (2025)
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
by: Qin, Ruoyu, et al.
Published: (2024)
by: Qin, Ruoyu, et al.
Published: (2024)
PiKV: KV Cache Management System for Mixture of Experts
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator
by: Hokenmaier, W, et al.
Published: (2024)
by: Hokenmaier, W, et al.
Published: (2024)
PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving
by: Yüzügüler, Ahmet Caner, et al.
Published: (2025)
by: Yüzügüler, Ahmet Caner, et al.
Published: (2025)
Investigating Memory Failure Prediction Across CPU Architectures
by: Yu, Qiao, et al.
Published: (2024)
by: Yu, Qiao, et al.
Published: (2024)
Similar Items
-
DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication
by: Pati, Suchita, et al.
Published: (2025) -
Intent-Driven Storage Systems: From Low-Level Tuning to High-Level Understanding
by: Bergman, Shai, et al.
Published: (2025) -
DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management
by: Zhou, Zhongchun, et al.
Published: (2025) -
Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement
by: Deng, Yunhao, et al.
Published: (2025) -
Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework
by: Stojkovic, Jovan, et al.
Published: (2025)