:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Esaulov, Vladislav, Chen, Jieyang, Podhorszki, Norbert, Suter, Fred, Klasky, Scott, Bourgeois, Anu G, Wan, Lipeng
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Performance
Online Access:	https://arxiv.org/abs/2506.17084
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HPDR: High-Performance Portable Scientific Data Reduction Framework
by: Chen, Jieyang, et al.
Published: (2025)

Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest
by: Wu, Xuan, et al.
Published: (2024)

Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring
by: Williams, Jeremy J., et al.
Published: (2024)

CGSim: A Simulation Framework for Large Scale Distributed Computing Environment
by: Vatsavai, Sairam Sri, et al.
Published: (2025)

HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs
by: Li, Yanliang, et al.
Published: (2025)

Automated Calibration of Parallel and Distributed Computing Simulators: A Case Study
by: McDonald, Jesse, et al.
Published: (2024)

Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE
by: Poptani, Akash, et al.
Published: (2025)

FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow
by: Heidari, Sina, et al.
Published: (2026)

QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models
by: Rashid, Md Hasanur, et al.
Published: (2026)

Denoising Application Performance Models with Noise-Resilient Priors
by: de Morais, Gustavo, et al.
Published: (2025)

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)

Comprehensive Plugin-Based Monitoring of Nexflow Workflow Executions
by: Kharma, Sami, et al.
Published: (2026)

AARC: Automated Affinity-aware Resource Configuration for Serverless Workflows
by: Jin, Lingxiao, et al.
Published: (2025)

LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
by: Xia, Yuning, et al.
Published: (2026)

CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)

AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)

Towards Portability at Scale: A Cross-Architecture Performance Evaluation of a GPU-enabled Shallow Water Solver
by: Villalobos, Johansell, et al.
Published: (2025)

Iterating Pointers: Enabling Static Analysis for Loop-based Pointers
by: Lepori, Andrea, et al.
Published: (2025)

A Hybrid Heuristic Framework for Resource-Efficient Querying of Scientific Experiments Data
by: Patel, Mayank, et al.
Published: (2025)

Active Inference-Based Adaptive Routing for Heterogeneous Edge AI Services
by: Wang, Zihang, et al.
Published: (2026)

Optimal Allocation of Tasks and Price of Anarchy of Distributed Optimization in Networked Computing Facilities
by: Mancuso, Vincenzo, et al.
Published: (2024)

ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025)

AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage
by: Rashid, Md Hasanur, et al.
Published: (2026)

Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs
by: Cornelius, Melanie, et al.
Published: (2025)

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)

Profiling and optimization of multi-card GPU machine learning jobs
by: Lawenda, Marcin, et al.
Published: (2025)

Optimal Parallel Scheduling under Concave Speedup Functions
by: Li, Chengzhang, et al.
Published: (2025)

WebAssembly and Unikernels: A Comparative Study for Serverless at the Edge
by: Besozzi, Valerio, et al.
Published: (2025)

Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)

Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)

Staging Blocked Evaluation over Structured Sparse Matrices
by: Das, Pratyush, et al.
Published: (2024)

Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study
by: Debnath, Shimul, et al.
Published: (2026)

Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC
by: Lin, Wei-Chen, et al.
Published: (2024)

Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving
by: Sun, Tingyang, et al.
Published: (2026)

Reducing Tail Latencies Through Environment- and Neighbour-aware Thread Management
by: Jeffery, Andrew, et al.
Published: (2024)

Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024)

Bridding OT and PaaS in Edge-to-Cloud Continuum
by: Barrios, Carlos J, et al.
Published: (2025)

Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)

Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)