:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liao, Changyue, Sun, Mo, Yang, Zihan, Xie, Jun, Chen, Kaiqi, Yuan, Binhang, Wu, Fei, Wang, Zeke
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2403.06504
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)

Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)

LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
by: Ding, Chuntao, et al.
Published: (2024)

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
by: Jangda, Abhinav, et al.
Published: (2023)

HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2025)

Low-Latency Federated Fine-Tuning for Large Language Models Over Wireless Networks
by: Pang, Zhiwen, et al.
Published: (2026)

EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models
by: Liu, Han, et al.
Published: (2025)

CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
by: Li, Suyi, et al.
Published: (2024)

BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025)

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
by: Jiang, Youhe, et al.
Published: (2025)

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
by: Jiang, Youhe, et al.
Published: (2025)

FedQuad: Adaptive Layer-wise LoRA Deployment and Activation Quantization for Federated Fine-Tuning
by: Li, Rukuo, et al.
Published: (2025)

SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2024)

Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
by: Barrak, Amine, et al.
Published: (2025)

Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
by: Mo, Zizhao, et al.
Published: (2024)

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
by: Yang, Ruijia, et al.
Published: (2026)

Memory-Efficient Split Federated Learning for LLM Fine-Tuning on Heterogeneous Mobile Devices
by: Chen, Xiaopei, et al.
Published: (2025)

HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences
by: Gu, Jianfeng, et al.
Published: (2025)

Exploring Selective Layer Fine-Tuning in Federated Learning
by: Sun, Yuchang, et al.
Published: (2024)

The Energy Cost of Execution-Idle in GPU Clusters
by: Lei, Yiran, et al.
Published: (2026)

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)

LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
by: Zhu, Zhanda, et al.
Published: (2025)

AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU
by: Zhang, Yuning, et al.
Published: (2026)

Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
by: Liao, Gang, et al.
Published: (2023)

Fine-Tuning GPT-5 for GPU Kernel Generation
by: Tehrani, Ali, et al.
Published: (2026)

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)

Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA
by: Wang, Xiaoyu, et al.
Published: (2026)

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
by: Duan, Jiaang, et al.
Published: (2025)

Fed-pilot: Optimizing LoRA Allocation for Efficient Federated Fine-Tuning with Heterogeneous Clients
by: Zhang, Zikai, et al.
Published: (2024)

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
by: Wang, Zeke, et al.
Published: (2025)

Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures
by: Shan, Baodi, et al.
Published: (2024)

Host-Side Telemetry for Performance Diagnosis in Cloud and HPC GPU Infrastructure
by: Darzi, Erfan, et al.
Published: (2025)

Dataflow-Oriented Classification and Performance Analysis of GPU-Accelerated Homomorphic Encryption
by: Nozaki, Ai, et al.
Published: (2026)

TX-Digital Twin: Visualizing Supercomputer GPU Performance Data Stream
by: Baskakova, Elena, et al.
Published: (2026)

HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
by: Liang, Yan, et al.
Published: (2026)

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
by: Jiang, Youhe, et al.
Published: (2023)

HexiScale: Facilitating Large Language Model Training over Heterogeneous Hardware
by: Yan, Ran, et al.
Published: (2024)

Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
by: Yu, Minchen, et al.
Published: (2023)

PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)