:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yim, Jinkyu, Song, Jaeyong, Choi, Yerim, Lee, Jaebeen, Jung, Jaewon, Jang, Hongsun, Lee, Jinho
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing Machine Learning
Online Access:	https://arxiv.org/abs/2405.18093
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs on Large Clusters
by: Song, Jaeyong, et al.
Published: (2023)

GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading
by: Song, Jaeyong, et al.
Published: (2026)

NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search
by: Song, Jaeyong, et al.
Published: (2026)

FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation
by: Park, Seongyeon, et al.
Published: (2025)

AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization
by: An, Hyeonjun, et al.
Published: (2026)

Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)

Fine-grained MoE Load Balancing with Linear Programming
by: Zhao, Chenqi, et al.
Published: (2025)

VersaSlot: Efficient Fine-grained FPGA Sharing with Big.Little Slots and Live Migration in FPGA Cluster
by: Gu, Jianfeng, et al.
Published: (2025)

Wireless Distributed Matrix-Vector Multiplication using Over-the-Air Computation and Analog Coding
by: Choi, Jinho
Published: (2024)

Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads
by: Zojer, Patrick, et al.
Published: (2026)

FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
by: Bin, Kyungmin, et al.
Published: (2025)

Addressing Variable Heterogeneity in Distributed Multimodal Training with Entrain
by: Jang, Insu, et al.
Published: (2026)

Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores
by: Los, Denis, et al.
Published: (2024)

PathWeaver: A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
by: Kim, Sukjin, et al.
Published: (2025)

ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains
by: Woo, Hyein, et al.
Published: (2025)

In-Vehicle Edge System for Real-Time Dashcam Video Analysis
by: Lee, Seyul, et al.
Published: (2024)

Galvatron: Automatic Distributed Training for Large Transformer Models
by: Gumaan, Esmail
Published: (2025)

ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity
by: Chen, Long, et al.
Published: (2025)

Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
by: Dong, Haotian, et al.
Published: (2025)

MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training
by: Zhao, Pinxue, et al.
Published: (2024)

TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
by: Kwak, Hyunseok, et al.
Published: (2025)

Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025)

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)

Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism
by: Li, Cong, et al.
Published: (2025)

Enabling Elastic Model Serving with MultiWorld
by: Lee, Myungjin, et al.
Published: (2024)

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025)

Sailor: Automating Distributed Training over Dynamic, Heterogeneous, and Geo-distributed Clusters
by: Strati, Foteini, et al.
Published: (2025)

Efficient Distributed MLLM Training with Cornstarch
by: Jang, Insu, et al.
Published: (2025)

HetCCL: Accelerating LLM Training with Heterogeneous GPUs
by: Kim, Heehoon, et al.
Published: (2026)

Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025)

Action Deviation-Aware Inference for Low-Latency Wireless Robots
by: Park, Jeyoung, et al.
Published: (2025)

BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models
by: Hu, Bodun, et al.
Published: (2024)

FCPO: Federated Continual Policy Optimization for Real-Time High-Throughput Edge Video Analytics
by: Liebe, Lucas, et al.
Published: (2025)

Training DNN Models over Heterogeneous Clusters with Optimal Performance
by: Nie, Chengyi, et al.
Published: (2024)

Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
by: Guo, Runsheng Benson, et al.
Published: (2024)

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)

Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
by: Guo, Runsheng Benson, et al.
Published: (2025)

TOD: Transprecise Object Detection to Maximise Real-Time Accuracy on the Edge
by: Lee, JunKyu, et al.
Published: (2021)

Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM
by: Kim, Ji-Beom, et al.
Published: (2024)