Saved in:
| Main Authors: | Poptani, Akash, Khadem, Alireza, Mahlke, Scott, Miller, Jonah, Dolence, Joshua, Das, Reetuparna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.19701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Performance Impact of Containerized METADOCK 2 on Heterogeneous Platforms
by: Banegas-Luna, Antonio Jesús, et al.
Published: (2025)
by: Banegas-Luna, Antonio Jesús, et al.
Published: (2025)
Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
by: Zhang, Yaozheng, et al.
Published: (2025)
by: Zhang, Yaozheng, et al.
Published: (2025)
THAPI: Tracing Heterogeneous APIs
by: Bekele, Solomon, et al.
Published: (2025)
by: Bekele, Solomon, et al.
Published: (2025)
Understanding Power Consumption Metric on Heterogeneous Memory Systems
by: Proaño, Andrès Rubio, et al.
Published: (2024)
by: Proaño, Andrès Rubio, et al.
Published: (2024)
Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)
by: Mao, Ying, et al.
Published: (2020)
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
by: Grbic, Dragana
Published: (2026)
by: Grbic, Dragana
Published: (2026)
Modeling and Characterizing Service Interference in Dynamic Infrastructures
by: Medel, VÍctor, et al.
Published: (2024)
by: Medel, VÍctor, et al.
Published: (2024)
Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL
by: Thüring, Tim, et al.
Published: (2026)
by: Thüring, Tim, et al.
Published: (2026)
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)
by: Zhao, Xuanlei, et al.
Published: (2024)
An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models
by: Chu, Xiaoyu, et al.
Published: (2025)
by: Chu, Xiaoyu, et al.
Published: (2025)
Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services
by: Wang, Zihang, et al.
Published: (2026)
by: Wang, Zihang, et al.
Published: (2026)
MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations
by: Dutta, Akash, et al.
Published: (2024)
by: Dutta, Akash, et al.
Published: (2024)
Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey
by: Ather, Hammad, et al.
Published: (2024)
by: Ather, Hammad, et al.
Published: (2024)
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)
by: Rashid, Md Hasanur, et al.
Published: (2026)
Staging Blocked Evaluation over Structured Sparse Matrices
by: Das, Pratyush, et al.
Published: (2024)
by: Das, Pratyush, et al.
Published: (2024)
Taking GPU Programming Models to Task for Performance Portability
by: Davis, Joshua H., et al.
Published: (2024)
by: Davis, Joshua H., et al.
Published: (2024)
KEET: Explaining Performance of GPU Kernels Using LLM Agents
by: Davis, Joshua H., et al.
Published: (2026)
by: Davis, Joshua H., et al.
Published: (2026)
CGSim: A Simulation Framework for Large Scale Distributed Computing Environment
by: Vatsavai, Sairam Sri, et al.
Published: (2025)
by: Vatsavai, Sairam Sri, et al.
Published: (2025)
Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
by: Lin, Zhongyi, et al.
Published: (2024)
by: Lin, Zhongyi, et al.
Published: (2024)
Demystifying Serverless Costs on Public Platforms: Bridging Billing, Architecture, and OS Scheduling
by: Lin, Changyuan, et al.
Published: (2025)
by: Lin, Changyuan, et al.
Published: (2025)
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
by: Xu, Guanyu, et al.
Published: (2025)
by: Xu, Guanyu, et al.
Published: (2025)
Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs
by: Cornelius, Melanie, et al.
Published: (2025)
by: Cornelius, Melanie, et al.
Published: (2025)
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)
by: Zhuang, Chen, et al.
Published: (2024)
Profiling and optimization of multi-card GPU machine learning jobs
by: Lawenda, Marcin, et al.
Published: (2025)
by: Lawenda, Marcin, et al.
Published: (2025)
Optimal Parallel Scheduling under Concave Speedup Functions
by: Li, Chengzhang, et al.
Published: (2025)
by: Li, Chengzhang, et al.
Published: (2025)
WebAssembly and Unikernels: A Comparative Study for Serverless at the Edge
by: Besozzi, Valerio, et al.
Published: (2025)
by: Besozzi, Valerio, et al.
Published: (2025)
Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)
by: Liu, Shifang, et al.
Published: (2025)
Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study
by: Debnath, Shimul, et al.
Published: (2026)
by: Debnath, Shimul, et al.
Published: (2026)
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)
by: Ng, Nathan, et al.
Published: (2026)
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
by: Lin, Wei-Chen, et al.
Published: (2024)
by: Lin, Wei-Chen, et al.
Published: (2024)
Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving
by: Sun, Tingyang, et al.
Published: (2026)
by: Sun, Tingyang, et al.
Published: (2026)
Reducing Tail Latencies Through Environment- and Neighbour-aware Thread Management
by: Jeffery, Andrew, et al.
Published: (2024)
by: Jeffery, Andrew, et al.
Published: (2024)
Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024)
by: Raffin, Guillaume, et al.
Published: (2024)
Bridding OT and PaaS in Edge-to-Cloud Continuum
by: Barrios, Carlos J, et al.
Published: (2025)
by: Barrios, Carlos J, et al.
Published: (2025)
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)
by: Karfakis, George, et al.
Published: (2025)
Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)
by: Jain, Rutwik, et al.
Published: (2026)
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)
by: Rahimi, Ghazal, et al.
Published: (2026)
Node Compass: Multilevel Tracing and Debugging of Request Executions in JavaScript-Based Web-Servers
by: Kabamba, Herve Mbikayi, et al.
Published: (2023)
by: Kabamba, Herve Mbikayi, et al.
Published: (2023)
Beyond Thread States: Diagnosing Performance Degradation with eBPF and Thread Dynamics
by: Landau, Diogo, et al.
Published: (2026)
by: Landau, Diogo, et al.
Published: (2026)
Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes
by: Berg, Benjamin, et al.
Published: (2024)
by: Berg, Benjamin, et al.
Published: (2024)
Similar Items
-
Performance Impact of Containerized METADOCK 2 on Heterogeneous Platforms
by: Banegas-Luna, Antonio Jesús, et al.
Published: (2025) -
Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
by: Zhang, Yaozheng, et al.
Published: (2025) -
THAPI: Tracing Heterogeneous APIs
by: Bekele, Solomon, et al.
Published: (2025) -
Understanding Power Consumption Metric on Heterogeneous Memory Systems
by: Proaño, Andrès Rubio, et al.
Published: (2024) -
Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)