:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Poptani, Akash, Khadem, Alireza, Mahlke, Scott, Miller, Jonah, Dolence, Joshua, Das, Reetuparna
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Performance
Online Access:	https://arxiv.org/abs/2509.19701
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Performance Impact of Containerized METADOCK 2 on Heterogeneous Platforms
by: Banegas-Luna, Antonio Jesús, et al.
Published: (2025)

Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
by: Zhang, Yaozheng, et al.
Published: (2025)

THAPI: Tracing Heterogeneous APIs
by: Bekele, Solomon, et al.
Published: (2025)

Understanding Power Consumption Metric on Heterogeneous Memory Systems
by: Proaño, Andrès Rubio, et al.
Published: (2024)

Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)

Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
by: Grbic, Dragana
Published: (2026)

Modeling and Characterizing Service Interference in Dynamic Infrastructures
by: Medel, VÍctor, et al.
Published: (2024)

Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL
by: Thüring, Tim, et al.
Published: (2026)

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)

An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models
by: Chu, Xiaoyu, et al.
Published: (2025)

Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services
by: Wang, Zihang, et al.
Published: (2026)

MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations
by: Dutta, Akash, et al.
Published: (2024)

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey
by: Ather, Hammad, et al.
Published: (2024)

CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)

Staging Blocked Evaluation over Structured Sparse Matrices
by: Das, Pratyush, et al.
Published: (2024)

Taking GPU Programming Models to Task for Performance Portability
by: Davis, Joshua H., et al.
Published: (2024)

KEET: Explaining Performance of GPU Kernels Using LLM Agents
by: Davis, Joshua H., et al.
Published: (2026)

CGSim: A Simulation Framework for Large Scale Distributed Computing Environment
by: Vatsavai, Sairam Sri, et al.
Published: (2025)

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
by: Lin, Zhongyi, et al.
Published: (2024)

Demystifying Serverless Costs on Public Platforms: Bridging Billing, Architecture, and OS Scheduling
by: Lin, Changyuan, et al.
Published: (2025)

CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
by: Xu, Guanyu, et al.
Published: (2025)

Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs
by: Cornelius, Melanie, et al.
Published: (2025)

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)

Profiling and optimization of multi-card GPU machine learning jobs
by: Lawenda, Marcin, et al.
Published: (2025)

Optimal Parallel Scheduling under Concave Speedup Functions
by: Li, Chengzhang, et al.
Published: (2025)

WebAssembly and Unikernels: A Comparative Study for Serverless at the Edge
by: Besozzi, Valerio, et al.
Published: (2025)

Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)

Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study
by: Debnath, Shimul, et al.
Published: (2026)

Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
by: Lin, Wei-Chen, et al.
Published: (2024)

Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving
by: Sun, Tingyang, et al.
Published: (2026)

Reducing Tail Latencies Through Environment- and Neighbour-aware Thread Management
by: Jeffery, Andrew, et al.
Published: (2024)

Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024)

Bridding OT and PaaS in Edge-to-Cloud Continuum
by: Barrios, Carlos J, et al.
Published: (2025)

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)

Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)

Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)

Node Compass: Multilevel Tracing and Debugging of Request Executions in JavaScript-Based Web-Servers
by: Kabamba, Herve Mbikayi, et al.
Published: (2023)

Beyond Thread States: Diagnosing Performance Degradation with eBPF and Thread Dynamics
by: Landau, Diogo, et al.
Published: (2026)

Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes
by: Berg, Benjamin, et al.
Published: (2024)