Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yang, Mengtian, Zhang, Zhekun, Wu, Mingheng, Yan, Jianwen, Sun, Hanshi, Chang, Li-wen
Format: Preprint
Veröffentlicht: 2026
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2605.17164
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866916029998825472
author Yang, Mengtian
Zhang, Zhekun
Wu, Mingheng
Yan, Jianwen
Sun, Hanshi
Chang, Li-wen
author_facet Yang, Mengtian
Zhang, Zhekun
Wu, Mingheng
Yan, Jianwen
Sun, Hanshi
Chang, Li-wen
contents Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value.
format Preprint
id arxiv_https___arxiv_org_abs_2605_17164
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference
Yang, Mengtian
Zhang, Zhekun
Wu, Mingheng
Yan, Jianwen
Sun, Hanshi
Chang, Li-wen
Distributed, Parallel, and Cluster Computing
Artificial Intelligence
Machine Learning
Programming Languages
Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value.
title Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference
topic Distributed, Parallel, and Cluster Computing
Artificial Intelligence
Machine Learning
Programming Languages
url https://arxiv.org/abs/2605.17164