Saved in:
| Main Authors: | Yim, Jinkyu, Song, Jaeyong, Choi, Yerim, Lee, Jaebeen, Jung, Jaewon, Jang, Hongsun, Lee, Jinho |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.18093 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs on Large Clusters
by: Song, Jaeyong, et al.
Published: (2023)
by: Song, Jaeyong, et al.
Published: (2023)
GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading
by: Song, Jaeyong, et al.
Published: (2026)
by: Song, Jaeyong, et al.
Published: (2026)
NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search
by: Song, Jaeyong, et al.
Published: (2026)
by: Song, Jaeyong, et al.
Published: (2026)
FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation
by: Park, Seongyeon, et al.
Published: (2025)
by: Park, Seongyeon, et al.
Published: (2025)
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)
by: Park, Seongyeon, et al.
Published: (2024)
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization
by: An, Hyeonjun, et al.
Published: (2026)
by: An, Hyeonjun, et al.
Published: (2026)
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)
by: Mo, Zizhao, et al.
Published: (2025)
Fine-grained MoE Load Balancing with Linear Programming
by: Zhao, Chenqi, et al.
Published: (2025)
by: Zhao, Chenqi, et al.
Published: (2025)
VersaSlot: Efficient Fine-grained FPGA Sharing with Big.Little Slots and Live Migration in FPGA Cluster
by: Gu, Jianfeng, et al.
Published: (2025)
by: Gu, Jianfeng, et al.
Published: (2025)
Wireless Distributed Matrix-Vector Multiplication using Over-the-Air Computation and Analog Coding
by: Choi, Jinho
Published: (2024)
by: Choi, Jinho
Published: (2024)
Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads
by: Zojer, Patrick, et al.
Published: (2026)
by: Zojer, Patrick, et al.
Published: (2026)
FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
by: Bin, Kyungmin, et al.
Published: (2025)
by: Bin, Kyungmin, et al.
Published: (2025)
Addressing Variable Heterogeneity in Distributed Multimodal Training with Entrain
by: Jang, Insu, et al.
Published: (2026)
by: Jang, Insu, et al.
Published: (2026)
Exploring Fine-grained Task Parallelism on Simultaneous Multithreading Cores
by: Los, Denis, et al.
Published: (2024)
by: Los, Denis, et al.
Published: (2024)
PathWeaver: A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
by: Kim, Sukjin, et al.
Published: (2025)
by: Kim, Sukjin, et al.
Published: (2025)
ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains
by: Woo, Hyein, et al.
Published: (2025)
by: Woo, Hyein, et al.
Published: (2025)
In-Vehicle Edge System for Real-Time Dashcam Video Analysis
by: Lee, Seyul, et al.
Published: (2024)
by: Lee, Seyul, et al.
Published: (2024)
Galvatron: Automatic Distributed Training for Large Transformer Models
by: Gumaan, Esmail
Published: (2025)
by: Gumaan, Esmail
Published: (2025)
ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity
by: Chen, Long, et al.
Published: (2025)
by: Chen, Long, et al.
Published: (2025)
Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
by: Dong, Haotian, et al.
Published: (2025)
by: Dong, Haotian, et al.
Published: (2025)
MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training
by: Zhao, Pinxue, et al.
Published: (2024)
by: Zhao, Pinxue, et al.
Published: (2024)
TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
by: Kwak, Hyunseok, et al.
Published: (2025)
by: Kwak, Hyunseok, et al.
Published: (2025)
Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025)
by: Wu, Jingfeng, et al.
Published: (2025)
Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)
by: Schieffer, Gabin, et al.
Published: (2026)
Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism
by: Li, Cong, et al.
Published: (2025)
by: Li, Cong, et al.
Published: (2025)
Enabling Elastic Model Serving with MultiWorld
by: Lee, Myungjin, et al.
Published: (2024)
by: Lee, Myungjin, et al.
Published: (2024)
MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025)
by: Zhao, Lu, et al.
Published: (2025)
Sailor: Automating Distributed Training over Dynamic, Heterogeneous, and Geo-distributed Clusters
by: Strati, Foteini, et al.
Published: (2025)
by: Strati, Foteini, et al.
Published: (2025)
Efficient Distributed MLLM Training with Cornstarch
by: Jang, Insu, et al.
Published: (2025)
by: Jang, Insu, et al.
Published: (2025)
HetCCL: Accelerating LLM Training with Heterogeneous GPUs
by: Kim, Heehoon, et al.
Published: (2026)
by: Kim, Heehoon, et al.
Published: (2026)
Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025)
by: Qiao, Tong, et al.
Published: (2025)
Action Deviation-Aware Inference for Low-Latency Wireless Robots
by: Park, Jeyoung, et al.
Published: (2025)
by: Park, Jeyoung, et al.
Published: (2025)
BlockLLM: Multi-tenant Finer-grained Serving for Large Language Models
by: Hu, Bodun, et al.
Published: (2024)
by: Hu, Bodun, et al.
Published: (2024)
FCPO: Federated Continual Policy Optimization for Real-Time High-Throughput Edge Video Analytics
by: Liebe, Lucas, et al.
Published: (2025)
by: Liebe, Lucas, et al.
Published: (2025)
Training DNN Models over Heterogeneous Clusters with Optimal Performance
by: Nie, Chengyi, et al.
Published: (2024)
by: Nie, Chengyi, et al.
Published: (2024)
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
by: Guo, Runsheng Benson, et al.
Published: (2024)
by: Guo, Runsheng Benson, et al.
Published: (2024)
HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)
by: Liang, Antian, et al.
Published: (2025)
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
by: Guo, Runsheng Benson, et al.
Published: (2025)
by: Guo, Runsheng Benson, et al.
Published: (2025)
TOD: Transprecise Object Detection to Maximise Real-Time Accuracy on the Edge
by: Lee, JunKyu, et al.
Published: (2021)
by: Lee, JunKyu, et al.
Published: (2021)
Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM
by: Kim, Ji-Beom, et al.
Published: (2024)
by: Kim, Ji-Beom, et al.
Published: (2024)
Similar Items
-
GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs on Large Clusters
by: Song, Jaeyong, et al.
Published: (2023) -
GriNNder: Breaking the Memory Capacity Wall in Full-Graph GNN Training with Storage Offloading
by: Song, Jaeyong, et al.
Published: (2026) -
NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search
by: Song, Jaeyong, et al.
Published: (2026) -
FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation
by: Park, Seongyeon, et al.
Published: (2025) -
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)