:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Fan, Li, Guodong, Wu, Si, Fang, Weijun, Hu, Sihuang
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2512.10425
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration
by: Nian, Sean, et al.
Published: (2026)

Lumiere: Making Optimal BFT for Partial Synchrony Practical
by: Lewis-Pye, Andrew, et al.
Published: (2023)

Cascadia: An Efficient Cascade Serving System for Large Language Models
by: Jiang, Youhe, et al.
Published: (2025)

Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience
by: Hu, Michael Yiqing, et al.
Published: (2026)

Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems
by: Lu, Ning, et al.
Published: (2024)

An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)

Efficient Profit Maximization in Reliability Concerned Static Vehicular Cloud System
by: Sarkar, Suvarthi, et al.
Published: (2023)

To Repair or Not to Repair: Assessing Fault Resilience in MPI Stencil Applications
by: Rocco, Roberto, et al.
Published: (2024)

New Wide Locally Recoverable Codes with Unified Locality
by: Xu, Liangliang, et al.
Published: (2025)

Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
by: Sun, Xun, et al.
Published: (2026)

IsoSched: Preemptive Tile Cascaded Scheduling of Multi-DNN via Subgraph Isomorphism
by: Zhao, Boran, et al.
Published: (2025)

Making Serverless Computing Extensible: A Case Study of Serverless Data Analytics
by: Yu, Minchen, et al.
Published: (2025)

CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing
by: Yuan, Yitao, et al.
Published: (2025)

Optimistic, Signature-Free Reliable Broadcast and Its Applications
by: Shrestha, Nibesh, et al.
Published: (2025)

Parallel Collaborative ADMM Privacy Computing and Adaptive GPU Acceleration for Distributed Edge Networks
by: Xia, Mengchun, et al.
Published: (2026)

Fault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing
by: Wang, Yuxin, et al.
Published: (2023)

SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances
by: Kim, Taeyoon, et al.
Published: (2026)

ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025)

CascadeServe: Unlocking Model Cascades for Inference Serving
by: Kossmann, Ferdi, et al.
Published: (2024)

Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems
by: Li, Rui, et al.
Published: (2025)

Dynamic Probabilistic Reliable Broadcast
by: Anikina, Veronika, et al.
Published: (2023)

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
by: Fang, Jiahao, et al.
Published: (2024)

Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
by: Lacinski, Lukasz, et al.
Published: (2024)

Pier: Efficient Large Language Model pretraining with Relaxed Global Communication
by: Fan, Shuyuan, et al.
Published: (2025)

FWeb3: A Practical Incentive-Aware Federated Learning Framework
by: Yan, Peishen, et al.
Published: (2026)

Exact, Efficient, and Reliable Multi-Objective and Multi-Constrained IoT Workflow Scheduling in Edge-Hub-Cloud Cyber-Physical Systems
by: Kouloumpris, Andreas, et al.
Published: (2026)

Highly-Efficient Persistent FIFO Queues
by: Fatourou, Panagiota, et al.
Published: (2024)

GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources
by: Du, Boxiao, et al.
Published: (2026)

OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency
by: Wang, Jun, et al.
Published: (2025)

Reliable Replication Protocols on SmartNICs
by: Katebzadeh, M. R. Siavash, et al.
Published: (2025)

ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
by: Shi, Ge, et al.
Published: (2025)

HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)

M$^2$-MFP: A Multi-Scale and Multi-Level Memory Failure Prediction Framework for Reliable Cloud Infrastructure
by: Xie, Hongyi, et al.
Published: (2025)

Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation
by: Wu, Tianyuan, et al.
Published: (2025)

STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
by: Chen, Yinfang, et al.
Published: (2025)

A Unified, Practical, and Understandable Model of Non-transactional Consistency Levels in Distributed Replication
by: Hu, Guanzhou, et al.
Published: (2024)

Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management
by: Wu, Z., et al.
Published: (2025)

S-HPLB: Efficient LLM Attention Serving via Sparsity-Aware Head Parallelism Load Balance
by: Liu, Di, et al.
Published: (2026)

Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis
by: Li, Yuetai, et al.
Published: (2025)

SimDC: A High-Fidelity Device Simulation Platform for Device-Cloud Collaborative Computing
by: Pei, Ruiguang, et al.
Published: (2025)