:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sritriratanarak, Warisa, Garcia, Paulo
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.05817
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Transforming Future Data Center Operations and Management via Physical AI
by: Cao, Zhiwei, et al.
Published: (2025)

ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture
by: Dobrin, Seth, et al.
Published: (2026)

Federated Learning for Cyber Physical Systems: A Comprehensive Survey
by: Quan, Minh K., et al.
Published: (2025)

DataCenterGym: A Physics-Grounded Simulator for Multi-Objective Data Center Scheduling
by: Pathak, Nilavra, et al.
Published: (2026)

AI Factories: It's time to rethink the Cloud-HPC divide
by: Lopez, Pedro Garcia, et al.
Published: (2025)

AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research
by: Heredia, Ignacio, et al.
Published: (2025)

FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)

KunServe: Parameter-centric Memory Management for Efficient Memory Overloading Handling in LLM Serving
by: Cheng, Rongxin, et al.
Published: (2024)

Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data
by: Mudgal, Priyanka, et al.
Published: (2024)

Topology-aware Preemptive Scheduling for Co-located LLM Workloads
by: Zhang, Ping, et al.
Published: (2024)

Can Large Language Models Write Parallel Code?
by: Nichols, Daniel, et al.
Published: (2024)

LLM as HPC Expert: Extending RAG Architecture for HPC Data
by: Miyashita, Yusuke, et al.
Published: (2024)

Boosting Asynchronous Decentralized Learning with Model Fragmentation
by: Biswas, Sayan, et al.
Published: (2024)

FedRAV: Hierarchically Federated Region-Learning for Traffic Object Classification of Autonomous Vehicles
by: Zhai, Yijun, et al.
Published: (2024)

FedPAW: Federated Learning with Personalized Aggregation Weights for Urban Vehicle Speed Prediction
by: He, Yuepeng, et al.
Published: (2024)

FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation
by: Palihawadana, Chamath, et al.
Published: (2024)

Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence
by: McIntosh-Smith, Simon, et al.
Published: (2024)

SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
by: Zhang, Ruisi, et al.
Published: (2024)

Dynamic Resource Allocation for Virtual Machine Migration Optimization using Machine Learning
by: Gong, Yulu, et al.
Published: (2024)

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
by: Cheng, Jialiang, et al.
Published: (2024)

Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
by: Chen, Siyuan, et al.
Published: (2024)

Analytically-Driven Resource Management for Cloud-Native Microservices
by: Zhang, Yanqi, et al.
Published: (2024)

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
by: Yang, Yuting, et al.
Published: (2024)

Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework
by: Jia, Ziye, et al.
Published: (2024)

Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint
by: Li, Jun, et al.
Published: (2024)

HETHUB: A Distributed Training System with Heterogeneous Cluster for Large-Scale Models
by: Xu, Si, et al.
Published: (2024)

Towards using Reinforcement Learning for Scaling and Data Replication in Cloud Systems
by: Mokadem, Riad, et al.
Published: (2024)

TS-EoH: An Edge Server Task Scheduling Algorithm Based on Evolution of Heuristic
by: Yatong, Wang, et al.
Published: (2024)

Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud
by: Mounesan, Motahare, et al.
Published: (2024)

ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving
by: Huang, Tao, et al.
Published: (2024)

Automated Road Safety: Enhancing Sign and Surface Damage Detection with AI
by: Merolla, Davide, et al.
Published: (2024)

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
by: Fang, Jiarui, et al.
Published: (2024)

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks
by: Shi, Ziji, et al.
Published: (2024)

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition
by: Hoque, Adnan, et al.
Published: (2024)

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
by: An, Wei, et al.
Published: (2024)

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
by: Zhang, Jianqing, et al.
Published: (2024)

Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
by: Li, Yanchen, et al.
Published: (2024)

Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training
by: Cao, Ray, et al.
Published: (2024)

Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
by: Wilkins, Grant, et al.
Published: (2024)

A Blockchain and Artificial Intelligence based System for Halal Food Traceability
by: Alourani, Abdulla, et al.
Published: (2024)