Saved in:
| Main Authors: | Liao, Changyue, Sun, Mo, Yang, Zihan, Xie, Jun, Chen, Kaiqi, Yuan, Binhang, Wu, Fei, Wang, Zeke |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.06504 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)
by: Sun, Mo, et al.
Published: (2024)
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)
by: Mo, Zizhao, et al.
Published: (2025)
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
by: Ding, Chuntao, et al.
Published: (2024)
by: Ding, Chuntao, et al.
Published: (2024)
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
by: Jangda, Abhinav, et al.
Published: (2023)
by: Jangda, Abhinav, et al.
Published: (2023)
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2025)
by: Lin, Zheng, et al.
Published: (2025)
Low-Latency Federated Fine-Tuning for Large Language Models Over Wireless Networks
by: Pang, Zhiwen, et al.
Published: (2026)
by: Pang, Zhiwen, et al.
Published: (2026)
EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models
by: Liu, Han, et al.
Published: (2025)
by: Liu, Han, et al.
Published: (2025)
CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
by: Li, Suyi, et al.
Published: (2024)
by: Li, Suyi, et al.
Published: (2024)
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025)
by: Zhang, Kunming, et al.
Published: (2025)
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
by: Jiang, Youhe, et al.
Published: (2025)
by: Jiang, Youhe, et al.
Published: (2025)
Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
by: Jiang, Youhe, et al.
Published: (2025)
by: Jiang, Youhe, et al.
Published: (2025)
FedQuad: Adaptive Layer-wise LoRA Deployment and Activation Quantization for Federated Fine-Tuning
by: Li, Rukuo, et al.
Published: (2025)
by: Li, Rukuo, et al.
Published: (2025)
SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2024)
by: Lin, Zheng, et al.
Published: (2024)
Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
by: Barrak, Amine, et al.
Published: (2025)
by: Barrak, Amine, et al.
Published: (2025)
Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
by: Mo, Zizhao, et al.
Published: (2024)
by: Mo, Zizhao, et al.
Published: (2024)
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
by: Yang, Ruijia, et al.
Published: (2026)
by: Yang, Ruijia, et al.
Published: (2026)
Memory-Efficient Split Federated Learning for LLM Fine-Tuning on Heterogeneous Mobile Devices
by: Chen, Xiaopei, et al.
Published: (2025)
by: Chen, Xiaopei, et al.
Published: (2025)
HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences
by: Gu, Jianfeng, et al.
Published: (2025)
by: Gu, Jianfeng, et al.
Published: (2025)
Exploring Selective Layer Fine-Tuning in Federated Learning
by: Sun, Yuchang, et al.
Published: (2024)
by: Sun, Yuchang, et al.
Published: (2024)
The Energy Cost of Execution-Idle in GPU Clusters
by: Lei, Yiran, et al.
Published: (2026)
by: Lei, Yiran, et al.
Published: (2026)
Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)
by: Mo, Zizhao, et al.
Published: (2026)
LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
by: Zhu, Zhanda, et al.
Published: (2025)
by: Zhu, Zhanda, et al.
Published: (2025)
AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU
by: Zhang, Yuning, et al.
Published: (2026)
by: Zhang, Yuning, et al.
Published: (2026)
Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)
by: Chang, Zihan, et al.
Published: (2024)
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
by: Liao, Gang, et al.
Published: (2023)
by: Liao, Gang, et al.
Published: (2023)
Fine-Tuning GPT-5 for GPU Kernel Generation
by: Tehrani, Ali, et al.
Published: (2026)
by: Tehrani, Ali, et al.
Published: (2026)
Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)
by: Schieffer, Gabin, et al.
Published: (2026)
Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA
by: Wang, Xiaoyu, et al.
Published: (2026)
by: Wang, Xiaoyu, et al.
Published: (2026)
GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
by: Duan, Jiaang, et al.
Published: (2025)
by: Duan, Jiaang, et al.
Published: (2025)
Fed-pilot: Optimizing LoRA Allocation for Efficient Federated Fine-Tuning with Heterogeneous Clients
by: Zhang, Zikai, et al.
Published: (2024)
by: Zhang, Zikai, et al.
Published: (2024)
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
by: Wang, Zeke, et al.
Published: (2025)
by: Wang, Zeke, et al.
Published: (2025)
Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures
by: Shan, Baodi, et al.
Published: (2024)
by: Shan, Baodi, et al.
Published: (2024)
Host-Side Telemetry for Performance Diagnosis in Cloud and HPC GPU Infrastructure
by: Darzi, Erfan, et al.
Published: (2025)
by: Darzi, Erfan, et al.
Published: (2025)
Dataflow-Oriented Classification and Performance Analysis of GPU-Accelerated Homomorphic Encryption
by: Nozaki, Ai, et al.
Published: (2026)
by: Nozaki, Ai, et al.
Published: (2026)
TX-Digital Twin: Visualizing Supercomputer GPU Performance Data Stream
by: Baskakova, Elena, et al.
Published: (2026)
by: Baskakova, Elena, et al.
Published: (2026)
HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
by: Liang, Yan, et al.
Published: (2026)
by: Liang, Yan, et al.
Published: (2026)
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
by: Jiang, Youhe, et al.
Published: (2023)
by: Jiang, Youhe, et al.
Published: (2023)
HexiScale: Facilitating Large Language Model Training over Heterogeneous Hardware
by: Yan, Ran, et al.
Published: (2024)
by: Yan, Ran, et al.
Published: (2024)
Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
by: Yu, Minchen, et al.
Published: (2023)
by: Yu, Minchen, et al.
Published: (2023)
PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
Similar Items
-
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024) -
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025) -
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
by: Ding, Chuntao, et al.
Published: (2024) -
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
by: Jangda, Abhinav, et al.
Published: (2023) -
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2025)