Saved in:
| Main Authors: | Cionca, Victor, Szabo, Ferenc, Vasilev, Stanimir, Smyth, Dylan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04498 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
by: Lin, Wei-Chen, et al.
Published: (2024)
by: Lin, Wei-Chen, et al.
Published: (2024)
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)
by: Rahimi, Ghazal, et al.
Published: (2026)
Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study
by: Debnath, Shimul, et al.
Published: (2026)
by: Debnath, Shimul, et al.
Published: (2026)
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)
by: Ng, Nathan, et al.
Published: (2026)
Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving
by: Sun, Tingyang, et al.
Published: (2026)
by: Sun, Tingyang, et al.
Published: (2026)
Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)
by: Jain, Rutwik, et al.
Published: (2026)
Beyond Thread States: Diagnosing Performance Degradation with eBPF and Thread Dynamics
by: Landau, Diogo, et al.
Published: (2026)
by: Landau, Diogo, et al.
Published: (2026)
QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models
by: Rashid, Md Hasanur, et al.
Published: (2026)
by: Rashid, Md Hasanur, et al.
Published: (2026)
High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia
by: Pilliat, Emmanuel
Published: (2026)
by: Pilliat, Emmanuel
Published: (2026)
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
by: Arif, Moiz, et al.
Published: (2026)
by: Arif, Moiz, et al.
Published: (2026)
Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams
by: Chen, David, et al.
Published: (2026)
by: Chen, David, et al.
Published: (2026)
PASTA: A Modular Program Analysis Tool Framework for Accelerators
by: Lin, Mao, et al.
Published: (2026)
by: Lin, Mao, et al.
Published: (2026)
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)
by: Rashid, Md Hasanur, et al.
Published: (2026)
The Energy Cost of Execution-Idle in GPU Clusters
by: Lei, Yiran, et al.
Published: (2026)
by: Lei, Yiran, et al.
Published: (2026)
ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations
by: Boudaoud, Afif, et al.
Published: (2026)
by: Boudaoud, Afif, et al.
Published: (2026)
Modeling the Impact of Fiber Latency on Compute-Communication Overlap in Geo-Distributed Multi-Datacenter AI Training
by: Papavasileiou, Ioannis, et al.
Published: (2026)
by: Papavasileiou, Ioannis, et al.
Published: (2026)
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
by: Grbic, Dragana
Published: (2026)
by: Grbic, Dragana
Published: (2026)
Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM
by: Ren, Xuanzhengbo, et al.
Published: (2026)
by: Ren, Xuanzhengbo, et al.
Published: (2026)
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
by: Xia, Yuning, et al.
Published: (2026)
by: Xia, Yuning, et al.
Published: (2026)
DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers
by: Maurya, Avinash, et al.
Published: (2026)
by: Maurya, Avinash, et al.
Published: (2026)
DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System
by: Rashid, Md Hasanur, et al.
Published: (2026)
by: Rashid, Md Hasanur, et al.
Published: (2026)
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
by: Lin, Mao, et al.
Published: (2026)
by: Lin, Mao, et al.
Published: (2026)
A Multi-Port Concurrent Communication Model for handling Compute Intensive Tasks on Distributed Satellite System Constellations
by: Veeravalli, Bharadwaj
Published: (2026)
by: Veeravalli, Bharadwaj
Published: (2026)
Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC Systems
by: MacLachlan, Glen, et al.
Published: (2026)
by: MacLachlan, Glen, et al.
Published: (2026)
FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow
by: Heidari, Sina, et al.
Published: (2026)
by: Heidari, Sina, et al.
Published: (2026)
KEET: Explaining Performance of GPU Kernels Using LLM Agents
by: Davis, Joshua H., et al.
Published: (2026)
by: Davis, Joshua H., et al.
Published: (2026)
Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity
by: Chang, Dali, et al.
Published: (2026)
by: Chang, Dali, et al.
Published: (2026)
Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL
by: Thüring, Tim, et al.
Published: (2026)
by: Thüring, Tim, et al.
Published: (2026)
Energy-Aware Computing in the Year 2026
by: Tchakoute, Roblex Nana, et al.
Published: (2026)
by: Tchakoute, Roblex Nana, et al.
Published: (2026)
Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects
by: Taylor, Maya, et al.
Published: (2026)
by: Taylor, Maya, et al.
Published: (2026)
Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs
by: Cornelius, Melanie, et al.
Published: (2025)
by: Cornelius, Melanie, et al.
Published: (2025)
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)
by: Zhuang, Chen, et al.
Published: (2024)
Profiling and optimization of multi-card GPU machine learning jobs
by: Lawenda, Marcin, et al.
Published: (2025)
by: Lawenda, Marcin, et al.
Published: (2025)
Optimal Parallel Scheduling under Concave Speedup Functions
by: Li, Chengzhang, et al.
Published: (2025)
by: Li, Chengzhang, et al.
Published: (2025)
WebAssembly and Unikernels: A Comparative Study for Serverless at the Edge
by: Besozzi, Valerio, et al.
Published: (2025)
by: Besozzi, Valerio, et al.
Published: (2025)
Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)
by: Liu, Shifang, et al.
Published: (2025)
Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)
by: Mao, Ying, et al.
Published: (2020)
Staging Blocked Evaluation over Structured Sparse Matrices
by: Das, Pratyush, et al.
Published: (2024)
by: Das, Pratyush, et al.
Published: (2024)
Reducing Tail Latencies Through Environment- and Neighbour-aware Thread Management
by: Jeffery, Andrew, et al.
Published: (2024)
by: Jeffery, Andrew, et al.
Published: (2024)
Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024)
by: Raffin, Guillaume, et al.
Published: (2024)
Similar Items
-
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
by: Lin, Wei-Chen, et al.
Published: (2024) -
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026) -
Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study
by: Debnath, Shimul, et al.
Published: (2026) -
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026) -
Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving
by: Sun, Tingyang, et al.
Published: (2026)