:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Rui, Zhi, Xiaoyun, Chi, Jinxin, Yu, Menghan, Huang, Lixin, Zhu, Jia, Zhang, Weilun, Ma, Xing, Liu, Wenjia, Zhu, Zhicheng, Luo, Daowen, Song, Zuquan, Yin, Xin, Xiang, Chao, Wang, Shuguang, Xiao, Wencong, Cooperman, Gene
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2507.12619
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach
by: Xu, Yao, et al.
Published: (2024)

HotSwap: Enabling Live Dependency Sharing in Serverless Computing
by: Li, Rui, et al.
Published: (2024)

The Case for ABI Interoperability in a Fault Tolerant MPI
by: Xu, Yao, et al.
Published: (2025)

Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
by: Deng, Yangtao, et al.
Published: (2025)

Seer: Predictive Runtime Kernel Selection for Irregular Problems
by: Swann, Ryan, et al.
Published: (2024)

Robust LLM Training Infrastructure at ByteDance
by: Wan, Borui, et al.
Published: (2025)

Understanding Stragglers in Large Model Training Using What-if Analysis
by: Lin, Jinkun, et al.
Published: (2025)

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
by: Qin, Ruoyu, et al.
Published: (2025)

Seer: Proactive Revenue-Aware Scheduling for Live Streaming Services in Crowdsourced Cloud-Edge Platforms
by: Huang, Shaoyuan, et al.
Published: (2024)

Mitigating GIL Bottlenecks in Edge AI Systems
by: Mandal, Mridankan, et al.
Published: (2026)

Analyzing Performance Bottlenecks in Zero-Knowledge Proof Based Rollups on Ethereum
by: Habib, Md. Ahsan
Published: (2025)

Revisiting finite Abelian hidden subgroup problem and its distributed exact quantum algorithm
by: Dong, Ziyuan, et al.
Published: (2025)

Minder: Faulty Machine Detection for Large-scale Distributed Model Training
by: Deng, Yangtao, et al.
Published: (2024)

Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
by: He, Guoliang, et al.
Published: (2025)

Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
by: Lin, Ruhai, et al.
Published: (2024)

One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
by: Yu, Wenjun, et al.
Published: (2026)

Distributed Quantum Discrete Logarithm Algorithm
by: Xu, Renjie, et al.
Published: (2026)

PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System
by: Frouzakis, Manos, et al.
Published: (2025)

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
by: Recasens, Pol G., et al.
Published: (2025)

AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence
by: Cai, Zhijie, et al.
Published: (2025)

CoCoI: Distributed Coded Inference System for Straggler Mitigation
by: Liu, Xing, et al.
Published: (2025)

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm
by: Lyu, Zhonghao, et al.
Published: (2024)

SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
by: Ghiasi, Nika Mansouri, et al.
Published: (2025)

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
by: Zhang, Xinyi, et al.
Published: (2024)

Unlearning during Learning: An Efficient Federated Machine Unlearning Method
by: Gu, Hanlin, et al.
Published: (2024)

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
by: Wu, Yongtong, et al.
Published: (2026)

Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks
by: Larsson, Oliver, et al.
Published: (2026)

Breaking the Aggregation Bottleneck in Federated Recommendation: A Personalized Model Merging Approach
by: Chen, Jundong, et al.
Published: (2025)

Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques
by: Bera, Rahul
Published: (2026)

Breaking the Capacity Bottleneck in Model-Heterogeneous Federated Learning via Gradual Model Restoration
by: Ma, Chengjie, et al.
Published: (2025)

Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
by: Arif, Moiz, et al.
Published: (2026)

Atomicity in Distributed Quantum Computing
by: Zhang, Zhicheng, et al.
Published: (2024)

Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
by: Meng, William, et al.
Published: (2025)

High-Efficiency Split Computing for Cooperative Edge Systems: A Novel Compressed Sensing Bottleneck
by: Zhong, Hailin, et al.
Published: (2025)

Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
by: Bu, Tianci, et al.
Published: (2026)

Stochastic Controlled Averaging for Federated Learning with Communication Compression
by: Huang, Xinmeng, et al.
Published: (2023)

FedSR: A Semi-Decentralized Federated Learning Algorithm for Non-IIDness in IoT System
by: Huang, Jianjun, et al.
Published: (2024)

Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation
by: Chen, Shaoyuan, et al.
Published: (2024)

Implementing True MPI Sessions and Evaluating MPI Initialization Scalability
by: Zhou, Hui, et al.
Published: (2026)

An Initial Evaluation of Distributed Graph Algorithms using NWGraph and HPX
by: Mohammadiporshokooh, Karame, et al.
Published: (2026)