Saved in:
| Main Authors: | Cao, Ruide, Qi, Zhuyun, He, Qinyang, Ling, Chenxi, Wang, Yi, Tang, Guoming |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17882 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025)
by: Zhang, Kunming, et al.
Published: (2025)
SLURM Heterogeneous Jobs for Hybrid Classical-Quantum Workflows
by: Esposito, Aniello, et al.
Published: (2025)
by: Esposito, Aniello, et al.
Published: (2025)
An Autonomy Loop for Dynamic HPC Job Time Limit Adjustment
by: Jakobsche, Thomas, et al.
Published: (2025)
by: Jakobsche, Thomas, et al.
Published: (2025)
Optimization-based Proof of Useful Work: Framework, Modeling, and Security Analysis
by: Cao, Weihang, et al.
Published: (2024)
by: Cao, Weihang, et al.
Published: (2024)
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
by: Luo, Yizhou, et al.
Published: (2024)
by: Luo, Yizhou, et al.
Published: (2024)
Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions
by: Zhao, Hailiang, et al.
Published: (2024)
by: Zhao, Hailiang, et al.
Published: (2024)
Energy-Efficient Real-Time Job Mapping and Resource Management in Mobile-Edge Computing
by: Gao, Chuanchao, et al.
Published: (2025)
by: Gao, Chuanchao, et al.
Published: (2025)
Exploiting the Uncertainty of the Longest Paths: Response Time Analysis for Probabilistic DAG Tasks
by: Gao, Yiyang, et al.
Published: (2025)
by: Gao, Yiyang, et al.
Published: (2025)
LLload: Simplifying Real-Time Job Monitoring for HPC Users
by: Byun, Chansup, et al.
Published: (2024)
by: Byun, Chansup, et al.
Published: (2024)
A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications
by: Chen, Lei, et al.
Published: (2024)
by: Chen, Lei, et al.
Published: (2024)
MicroPython Testbed for Federated Learning Algorithms
by: Popovic, Miroslav, et al.
Published: (2024)
by: Popovic, Miroslav, et al.
Published: (2024)
Accelerating Python Applications with Dask and ProxyStore
by: Pauloski, J. Gregory, et al.
Published: (2024)
by: Pauloski, J. Gregory, et al.
Published: (2024)
Performance Evaluation of Automated Multi-Service Deployment in Edge-Cloud Environments with the CODECO Toolkit
by: Koukis, Georgios, et al.
Published: (2026)
by: Koukis, Georgios, et al.
Published: (2026)
An Elastic Job Scheduler for HPC Applications on the Cloud
by: Bhosale, Aditya, et al.
Published: (2025)
by: Bhosale, Aditya, et al.
Published: (2025)
Parsl+CWL: Towards Combining the Python and CWL Ecosystems
by: Karle, Nishchay, et al.
Published: (2024)
by: Karle, Nishchay, et al.
Published: (2024)
Multi-Event Triggers for Serverless Computing
by: Carl, Natalie, et al.
Published: (2025)
by: Carl, Natalie, et al.
Published: (2025)
Scalable HPC Job Scheduling and Resource Management in SST
by: Abdurahman, Abubeker, et al.
Published: (2025)
by: Abdurahman, Abubeker, et al.
Published: (2025)
FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs
by: Barcelona-Pons, Daniel, et al.
Published: (2024)
by: Barcelona-Pons, Daniel, et al.
Published: (2024)
iAnomaly: A Toolkit for Generating Performance Anomaly Datasets in Edge-Cloud Integrated Computing Environments
by: Fernando, Duneesha, et al.
Published: (2024)
by: Fernando, Duneesha, et al.
Published: (2024)
Deep Back-Filling: a Split Window Technique for Deep Online Cluster Job Scheduling
by: Wang, Lingfei, et al.
Published: (2024)
by: Wang, Lingfei, et al.
Published: (2024)
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
by: Chu, Xiaoyu, et al.
Published: (2024)
by: Chu, Xiaoyu, et al.
Published: (2024)
Dflow, a Python framework for constructing cloud-native AI-for-Science workflows
by: Liu, Xinzijian, et al.
Published: (2024)
by: Liu, Xinzijian, et al.
Published: (2024)
DGNNFlow: A Streaming Dataflow Architecture for Real-Time Edge-based Dynamic GNN Inference in HL-LHC Trigger Systems
by: Maharaj, Davendra, et al.
Published: (2026)
by: Maharaj, Davendra, et al.
Published: (2026)
DMRlib: Easy-coding and Efficient Resource Management for Job Malleability
by: Iserte, Sergio, et al.
Published: (2026)
by: Iserte, Sergio, et al.
Published: (2026)
Incisor: Ex Ante Cloud Instance Selection for HPC Jobs
by: Laurenzano, Michael A., et al.
Published: (2026)
by: Laurenzano, Michael A., et al.
Published: (2026)
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
by: Zhang, Xinyi, et al.
Published: (2024)
by: Zhang, Xinyi, et al.
Published: (2024)
Serving Chain-structured Jobs with Large Memory Footprints with Application to Large Foundation Model Serving
by: Sun, Tingyang, et al.
Published: (2026)
by: Sun, Tingyang, et al.
Published: (2026)
Understanding GPU Triggering APIs for MPI+X Communication
by: Bridges, Patrick G., et al.
Published: (2024)
by: Bridges, Patrick G., et al.
Published: (2024)
Metronome: Efficient Scheduling for Periodic Traffic Jobs with Network and Priority Awareness
by: Jiang, Hao, et al.
Published: (2025)
by: Jiang, Hao, et al.
Published: (2025)
PSI/J: A Portable Interface for Submitting, Monitoring, and Managing Jobs
by: Hategan-Marandiuc, Mihael, et al.
Published: (2023)
by: Hategan-Marandiuc, Mihael, et al.
Published: (2023)
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs
by: Kolker-Hicks, Elliot, et al.
Published: (2024)
by: Kolker-Hicks, Elliot, et al.
Published: (2024)
Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation
by: Wu, Tianyuan, et al.
Published: (2025)
by: Wu, Tianyuan, et al.
Published: (2025)
Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes
by: Berg, Benjamin, et al.
Published: (2024)
by: Berg, Benjamin, et al.
Published: (2024)
In Situ In Transit Hybrid Analysis with Catalyst-ADIOS2
by: Mazen, François, et al.
Published: (2024)
by: Mazen, François, et al.
Published: (2024)
Quantifying the Carbon Reduction of DAG Workloads: A Job Shop Scheduling Perspective
by: Bostandoost, Roozbeh, et al.
Published: (2025)
by: Bostandoost, Roozbeh, et al.
Published: (2025)
Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads
by: Zojer, Patrick, et al.
Published: (2026)
by: Zojer, Patrick, et al.
Published: (2026)
Fault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing
by: Wang, Yuxin, et al.
Published: (2023)
by: Wang, Yuxin, et al.
Published: (2023)
Flora: Efficient Cloud Resource Selection for Big Data Processing via Job Classification
by: Will, Jonathan, et al.
Published: (2025)
by: Will, Jonathan, et al.
Published: (2025)
SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling
by: Amrizal, Muhammad Alfian, et al.
Published: (2025)
by: Amrizal, Muhammad Alfian, et al.
Published: (2025)
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps
by: Arima, Eishi, et al.
Published: (2024)
by: Arima, Eishi, et al.
Published: (2024)
Similar Items
-
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025) -
SLURM Heterogeneous Jobs for Hybrid Classical-Quantum Workflows
by: Esposito, Aniello, et al.
Published: (2025) -
An Autonomy Loop for Dynamic HPC Job Time Limit Adjustment
by: Jakobsche, Thomas, et al.
Published: (2025) -
Optimization-based Proof of Useful Work: Framework, Modeling, and Security Analysis
by: Cao, Weihang, et al.
Published: (2024) -
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
by: Luo, Yizhou, et al.
Published: (2024)