Gespeichert in:
| Hauptverfasser: | , , , , , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2026
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2605.17164 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866916029998825472 |
|---|---|
| author | Yang, Mengtian Zhang, Zhekun Wu, Mingheng Yan, Jianwen Sun, Hanshi Chang, Li-wen |
| author_facet | Yang, Mengtian Zhang, Zhekun Wu, Mingheng Yan, Jianwen Sun, Hanshi Chang, Li-wen |
| contents | Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_17164 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference Yang, Mengtian Zhang, Zhekun Wu, Mingheng Yan, Jianwen Sun, Hanshi Chang, Li-wen Distributed, Parallel, and Cluster Computing Artificial Intelligence Machine Learning Programming Languages Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value. |
| title | Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference |
| topic | Distributed, Parallel, and Cluster Computing Artificial Intelligence Machine Learning Programming Languages |
| url | https://arxiv.org/abs/2605.17164 |