Saved in:
| Main Authors: | Li, Rui, Zhi, Xiaoyun, Chi, Jinxin, Yu, Menghan, Huang, Lixin, Zhu, Jia, Zhang, Weilun, Ma, Xing, Liu, Wenjia, Zhu, Zhicheng, Luo, Daowen, Song, Zuquan, Yin, Xin, Xiang, Chao, Wang, Shuguang, Xiao, Wencong, Cooperman, Gene |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.12619 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach
by: Xu, Yao, et al.
Published: (2024)
by: Xu, Yao, et al.
Published: (2024)
HotSwap: Enabling Live Dependency Sharing in Serverless Computing
by: Li, Rui, et al.
Published: (2024)
by: Li, Rui, et al.
Published: (2024)
The Case for ABI Interoperability in a Fault Tolerant MPI
by: Xu, Yao, et al.
Published: (2025)
by: Xu, Yao, et al.
Published: (2025)
Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
by: Deng, Yangtao, et al.
Published: (2025)
by: Deng, Yangtao, et al.
Published: (2025)
Seer: Predictive Runtime Kernel Selection for Irregular Problems
by: Swann, Ryan, et al.
Published: (2024)
by: Swann, Ryan, et al.
Published: (2024)
Robust LLM Training Infrastructure at ByteDance
by: Wan, Borui, et al.
Published: (2025)
by: Wan, Borui, et al.
Published: (2025)
Understanding Stragglers in Large Model Training Using What-if Analysis
by: Lin, Jinkun, et al.
Published: (2025)
by: Lin, Jinkun, et al.
Published: (2025)
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
by: Qin, Ruoyu, et al.
Published: (2025)
by: Qin, Ruoyu, et al.
Published: (2025)
Seer: Proactive Revenue-Aware Scheduling for Live Streaming Services in Crowdsourced Cloud-Edge Platforms
by: Huang, Shaoyuan, et al.
Published: (2024)
by: Huang, Shaoyuan, et al.
Published: (2024)
Mitigating GIL Bottlenecks in Edge AI Systems
by: Mandal, Mridankan, et al.
Published: (2026)
by: Mandal, Mridankan, et al.
Published: (2026)
Analyzing Performance Bottlenecks in Zero-Knowledge Proof Based Rollups on Ethereum
by: Habib, Md. Ahsan
Published: (2025)
by: Habib, Md. Ahsan
Published: (2025)
Revisiting finite Abelian hidden subgroup problem and its distributed exact quantum algorithm
by: Dong, Ziyuan, et al.
Published: (2025)
by: Dong, Ziyuan, et al.
Published: (2025)
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
by: Deng, Yangtao, et al.
Published: (2024)
by: Deng, Yangtao, et al.
Published: (2024)
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
by: He, Guoliang, et al.
Published: (2025)
by: He, Guoliang, et al.
Published: (2025)
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
by: Lin, Ruhai, et al.
Published: (2024)
by: Lin, Ruhai, et al.
Published: (2024)
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
by: Yu, Wenjun, et al.
Published: (2026)
by: Yu, Wenjun, et al.
Published: (2026)
Distributed Quantum Discrete Logarithm Algorithm
by: Xu, Renjie, et al.
Published: (2026)
by: Xu, Renjie, et al.
Published: (2026)
PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System
by: Frouzakis, Manos, et al.
Published: (2025)
by: Frouzakis, Manos, et al.
Published: (2025)
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
by: Recasens, Pol G., et al.
Published: (2025)
by: Recasens, Pol G., et al.
Published: (2025)
AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence
by: Cai, Zhijie, et al.
Published: (2025)
by: Cai, Zhijie, et al.
Published: (2025)
CoCoI: Distributed Coded Inference System for Straggler Mitigation
by: Liu, Xing, et al.
Published: (2025)
by: Liu, Xing, et al.
Published: (2025)
Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm
by: Lyu, Zhonghao, et al.
Published: (2024)
by: Lyu, Zhonghao, et al.
Published: (2024)
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
by: Ghiasi, Nika Mansouri, et al.
Published: (2025)
by: Ghiasi, Nika Mansouri, et al.
Published: (2025)
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
by: Zhang, Xinyi, et al.
Published: (2024)
by: Zhang, Xinyi, et al.
Published: (2024)
Unlearning during Learning: An Efficient Federated Machine Unlearning Method
by: Gu, Hanlin, et al.
Published: (2024)
by: Gu, Hanlin, et al.
Published: (2024)
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
by: Wu, Yongtong, et al.
Published: (2026)
by: Wu, Yongtong, et al.
Published: (2026)
Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks
by: Larsson, Oliver, et al.
Published: (2026)
by: Larsson, Oliver, et al.
Published: (2026)
Breaking the Aggregation Bottleneck in Federated Recommendation: A Personalized Model Merging Approach
by: Chen, Jundong, et al.
Published: (2025)
by: Chen, Jundong, et al.
Published: (2025)
Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques
by: Bera, Rahul
Published: (2026)
by: Bera, Rahul
Published: (2026)
Breaking the Capacity Bottleneck in Model-Heterogeneous Federated Learning via Gradual Model Restoration
by: Ma, Chengjie, et al.
Published: (2025)
by: Ma, Chengjie, et al.
Published: (2025)
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
by: Arif, Moiz, et al.
Published: (2026)
by: Arif, Moiz, et al.
Published: (2026)
Atomicity in Distributed Quantum Computing
by: Zhang, Zhicheng, et al.
Published: (2024)
by: Zhang, Zhicheng, et al.
Published: (2024)
Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
by: Meng, William, et al.
Published: (2025)
by: Meng, William, et al.
Published: (2025)
High-Efficiency Split Computing for Cooperative Edge Systems: A Novel Compressed Sensing Bottleneck
by: Zhong, Hailin, et al.
Published: (2025)
by: Zhong, Hailin, et al.
Published: (2025)
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
by: Bu, Tianci, et al.
Published: (2026)
by: Bu, Tianci, et al.
Published: (2026)
Stochastic Controlled Averaging for Federated Learning with Communication Compression
by: Huang, Xinmeng, et al.
Published: (2023)
by: Huang, Xinmeng, et al.
Published: (2023)
FedSR: A Semi-Decentralized Federated Learning Algorithm for Non-IIDness in IoT System
by: Huang, Jianjun, et al.
Published: (2024)
by: Huang, Jianjun, et al.
Published: (2024)
Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation
by: Chen, Shaoyuan, et al.
Published: (2024)
by: Chen, Shaoyuan, et al.
Published: (2024)
Implementing True MPI Sessions and Evaluating MPI Initialization Scalability
by: Zhou, Hui, et al.
Published: (2026)
by: Zhou, Hui, et al.
Published: (2026)
An Initial Evaluation of Distributed Graph Algorithms using NWGraph and HPX
by: Mohammadiporshokooh, Karame, et al.
Published: (2026)
by: Mohammadiporshokooh, Karame, et al.
Published: (2026)
Similar Items
-
Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach
by: Xu, Yao, et al.
Published: (2024) -
HotSwap: Enabling Live Dependency Sharing in Serverless Computing
by: Li, Rui, et al.
Published: (2024) -
The Case for ABI Interoperability in a Fault Tolerant MPI
by: Xu, Yao, et al.
Published: (2025) -
Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
by: Deng, Yangtao, et al.
Published: (2025) -
Seer: Predictive Runtime Kernel Selection for Irregular Problems
by: Swann, Ryan, et al.
Published: (2024)