Saved in:
| Main Authors: | Sritriratanarak, Warisa, Garcia, Paulo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.05817 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Transforming Future Data Center Operations and Management via Physical AI
by: Cao, Zhiwei, et al.
Published: (2025)
by: Cao, Zhiwei, et al.
Published: (2025)
ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture
by: Dobrin, Seth, et al.
Published: (2026)
by: Dobrin, Seth, et al.
Published: (2026)
Federated Learning for Cyber Physical Systems: A Comprehensive Survey
by: Quan, Minh K., et al.
Published: (2025)
by: Quan, Minh K., et al.
Published: (2025)
DataCenterGym: A Physics-Grounded Simulator for Multi-Objective Data Center Scheduling
by: Pathak, Nilavra, et al.
Published: (2026)
by: Pathak, Nilavra, et al.
Published: (2026)
AI Factories: It's time to rethink the Cloud-HPC divide
by: Lopez, Pedro Garcia, et al.
Published: (2025)
by: Lopez, Pedro Garcia, et al.
Published: (2025)
AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research
by: Heredia, Ignacio, et al.
Published: (2025)
by: Heredia, Ignacio, et al.
Published: (2025)
FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)
by: Zhang, Jiashu, et al.
Published: (2024)
KunServe: Parameter-centric Memory Management for Efficient Memory Overloading Handling in LLM Serving
by: Cheng, Rongxin, et al.
Published: (2024)
by: Cheng, Rongxin, et al.
Published: (2024)
Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data
by: Mudgal, Priyanka, et al.
Published: (2024)
by: Mudgal, Priyanka, et al.
Published: (2024)
Topology-aware Preemptive Scheduling for Co-located LLM Workloads
by: Zhang, Ping, et al.
Published: (2024)
by: Zhang, Ping, et al.
Published: (2024)
Can Large Language Models Write Parallel Code?
by: Nichols, Daniel, et al.
Published: (2024)
by: Nichols, Daniel, et al.
Published: (2024)
LLM as HPC Expert: Extending RAG Architecture for HPC Data
by: Miyashita, Yusuke, et al.
Published: (2024)
by: Miyashita, Yusuke, et al.
Published: (2024)
Boosting Asynchronous Decentralized Learning with Model Fragmentation
by: Biswas, Sayan, et al.
Published: (2024)
by: Biswas, Sayan, et al.
Published: (2024)
FedRAV: Hierarchically Federated Region-Learning for Traffic Object Classification of Autonomous Vehicles
by: Zhai, Yijun, et al.
Published: (2024)
by: Zhai, Yijun, et al.
Published: (2024)
FedPAW: Federated Learning with Personalized Aggregation Weights for Urban Vehicle Speed Prediction
by: He, Yuepeng, et al.
Published: (2024)
by: He, Yuepeng, et al.
Published: (2024)
FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation
by: Palihawadana, Chamath, et al.
Published: (2024)
by: Palihawadana, Chamath, et al.
Published: (2024)
Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence
by: McIntosh-Smith, Simon, et al.
Published: (2024)
by: McIntosh-Smith, Simon, et al.
Published: (2024)
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
by: Zhang, Ruisi, et al.
Published: (2024)
by: Zhang, Ruisi, et al.
Published: (2024)
Dynamic Resource Allocation for Virtual Machine Migration Optimization using Machine Learning
by: Gong, Yulu, et al.
Published: (2024)
by: Gong, Yulu, et al.
Published: (2024)
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
by: Cheng, Jialiang, et al.
Published: (2024)
by: Cheng, Jialiang, et al.
Published: (2024)
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
by: Chen, Siyuan, et al.
Published: (2024)
by: Chen, Siyuan, et al.
Published: (2024)
Analytically-Driven Resource Management for Cloud-Native Microservices
by: Zhang, Yanqi, et al.
Published: (2024)
by: Zhang, Yanqi, et al.
Published: (2024)
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
by: Yang, Yuting, et al.
Published: (2024)
by: Yang, Yuting, et al.
Published: (2024)
Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework
by: Jia, Ziye, et al.
Published: (2024)
by: Jia, Ziye, et al.
Published: (2024)
Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint
by: Li, Jun, et al.
Published: (2024)
by: Li, Jun, et al.
Published: (2024)
HETHUB: A Distributed Training System with Heterogeneous Cluster for Large-Scale Models
by: Xu, Si, et al.
Published: (2024)
by: Xu, Si, et al.
Published: (2024)
Towards using Reinforcement Learning for Scaling and Data Replication in Cloud Systems
by: Mokadem, Riad, et al.
Published: (2024)
by: Mokadem, Riad, et al.
Published: (2024)
TS-EoH: An Edge Server Task Scheduling Algorithm Based on Evolution of Heuristic
by: Yatong, Wang, et al.
Published: (2024)
by: Yatong, Wang, et al.
Published: (2024)
Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud
by: Mounesan, Motahare, et al.
Published: (2024)
by: Mounesan, Motahare, et al.
Published: (2024)
ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving
by: Huang, Tao, et al.
Published: (2024)
by: Huang, Tao, et al.
Published: (2024)
Automated Road Safety: Enhancing Sign and Surface Damage Detection with AI
by: Merolla, Davide, et al.
Published: (2024)
by: Merolla, Davide, et al.
Published: (2024)
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
by: Fang, Jiarui, et al.
Published: (2024)
by: Fang, Jiarui, et al.
Published: (2024)
ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks
by: Shi, Ziji, et al.
Published: (2024)
by: Shi, Ziji, et al.
Published: (2024)
Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition
by: Hoque, Adnan, et al.
Published: (2024)
by: Hoque, Adnan, et al.
Published: (2024)
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
by: An, Wei, et al.
Published: (2024)
by: An, Wei, et al.
Published: (2024)
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
by: Zhang, Jianqing, et al.
Published: (2024)
by: Zhang, Jianqing, et al.
Published: (2024)
Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
by: Li, Yanchen, et al.
Published: (2024)
by: Li, Yanchen, et al.
Published: (2024)
Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training
by: Cao, Ray, et al.
Published: (2024)
by: Cao, Ray, et al.
Published: (2024)
Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
by: Wilkins, Grant, et al.
Published: (2024)
by: Wilkins, Grant, et al.
Published: (2024)
A Blockchain and Artificial Intelligence based System for Halal Food Traceability
by: Alourani, Abdulla, et al.
Published: (2024)
by: Alourani, Abdulla, et al.
Published: (2024)
Similar Items
-
Transforming Future Data Center Operations and Management via Physical AI
by: Cao, Zhiwei, et al.
Published: (2025) -
ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture
by: Dobrin, Seth, et al.
Published: (2026) -
Federated Learning for Cyber Physical Systems: A Comprehensive Survey
by: Quan, Minh K., et al.
Published: (2025) -
DataCenterGym: A Physics-Grounded Simulator for Multi-Objective Data Center Scheduling
by: Pathak, Nilavra, et al.
Published: (2026) -
AI Factories: It's time to rethink the Cloud-HPC divide
by: Lopez, Pedro Garcia, et al.
Published: (2025)