Saved in:
| Main Authors: | Yu, Fan, Li, Guodong, Wu, Si, Fang, Weijun, Hu, Sihuang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10425 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration
by: Nian, Sean, et al.
Published: (2026)
by: Nian, Sean, et al.
Published: (2026)
Lumiere: Making Optimal BFT for Partial Synchrony Practical
by: Lewis-Pye, Andrew, et al.
Published: (2023)
by: Lewis-Pye, Andrew, et al.
Published: (2023)
Cascadia: An Efficient Cascade Serving System for Large Language Models
by: Jiang, Youhe, et al.
Published: (2025)
by: Jiang, Youhe, et al.
Published: (2025)
Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience
by: Hu, Michael Yiqing, et al.
Published: (2026)
by: Hu, Michael Yiqing, et al.
Published: (2026)
Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems
by: Lu, Ning, et al.
Published: (2024)
by: Lu, Ning, et al.
Published: (2024)
An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)
by: Zhang, Mingjun, et al.
Published: (2025)
Efficient Profit Maximization in Reliability Concerned Static Vehicular Cloud System
by: Sarkar, Suvarthi, et al.
Published: (2023)
by: Sarkar, Suvarthi, et al.
Published: (2023)
To Repair or Not to Repair: Assessing Fault Resilience in MPI Stencil Applications
by: Rocco, Roberto, et al.
Published: (2024)
by: Rocco, Roberto, et al.
Published: (2024)
New Wide Locally Recoverable Codes with Unified Locality
by: Xu, Liangliang, et al.
Published: (2025)
by: Xu, Liangliang, et al.
Published: (2025)
Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
by: Sun, Xun, et al.
Published: (2026)
by: Sun, Xun, et al.
Published: (2026)
IsoSched: Preemptive Tile Cascaded Scheduling of Multi-DNN via Subgraph Isomorphism
by: Zhao, Boran, et al.
Published: (2025)
by: Zhao, Boran, et al.
Published: (2025)
Making Serverless Computing Extensible: A Case Study of Serverless Data Analytics
by: Yu, Minchen, et al.
Published: (2025)
by: Yu, Minchen, et al.
Published: (2025)
CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing
by: Yuan, Yitao, et al.
Published: (2025)
by: Yuan, Yitao, et al.
Published: (2025)
Optimistic, Signature-Free Reliable Broadcast and Its Applications
by: Shrestha, Nibesh, et al.
Published: (2025)
by: Shrestha, Nibesh, et al.
Published: (2025)
Parallel Collaborative ADMM Privacy Computing and Adaptive GPU Acceleration for Distributed Edge Networks
by: Xia, Mengchun, et al.
Published: (2026)
by: Xia, Mengchun, et al.
Published: (2026)
Fault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing
by: Wang, Yuxin, et al.
Published: (2023)
by: Wang, Yuxin, et al.
Published: (2023)
SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances
by: Kim, Taeyoon, et al.
Published: (2026)
by: Kim, Taeyoon, et al.
Published: (2026)
ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025)
by: Singh, Gursimran, et al.
Published: (2025)
CascadeServe: Unlocking Model Cascades for Inference Serving
by: Kossmann, Ferdi, et al.
Published: (2024)
by: Kossmann, Ferdi, et al.
Published: (2024)
Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
Dynamic Probabilistic Reliable Broadcast
by: Anikina, Veronika, et al.
Published: (2023)
by: Anikina, Veronika, et al.
Published: (2023)
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
by: Fang, Jiahao, et al.
Published: (2024)
by: Fang, Jiahao, et al.
Published: (2024)
Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
by: Lacinski, Lukasz, et al.
Published: (2024)
by: Lacinski, Lukasz, et al.
Published: (2024)
Pier: Efficient Large Language Model pretraining with Relaxed Global Communication
by: Fan, Shuyuan, et al.
Published: (2025)
by: Fan, Shuyuan, et al.
Published: (2025)
FWeb3: A Practical Incentive-Aware Federated Learning Framework
by: Yan, Peishen, et al.
Published: (2026)
by: Yan, Peishen, et al.
Published: (2026)
Exact, Efficient, and Reliable Multi-Objective and Multi-Constrained IoT Workflow Scheduling in Edge-Hub-Cloud Cyber-Physical Systems
by: Kouloumpris, Andreas, et al.
Published: (2026)
by: Kouloumpris, Andreas, et al.
Published: (2026)
Highly-Efficient Persistent FIFO Queues
by: Fatourou, Panagiota, et al.
Published: (2024)
by: Fatourou, Panagiota, et al.
Published: (2024)
GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources
by: Du, Boxiao, et al.
Published: (2026)
by: Du, Boxiao, et al.
Published: (2026)
OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency
by: Wang, Jun, et al.
Published: (2025)
by: Wang, Jun, et al.
Published: (2025)
Reliable Replication Protocols on SmartNICs
by: Katebzadeh, M. R. Siavash, et al.
Published: (2025)
by: Katebzadeh, M. R. Siavash, et al.
Published: (2025)
ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
by: Shi, Ge, et al.
Published: (2025)
by: Shi, Ge, et al.
Published: (2025)
HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)
by: Lin, Haoran, et al.
Published: (2025)
M$^2$-MFP: A Multi-Scale and Multi-Level Memory Failure Prediction Framework for Reliable Cloud Infrastructure
by: Xie, Hongyi, et al.
Published: (2025)
by: Xie, Hongyi, et al.
Published: (2025)
Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation
by: Wu, Tianyuan, et al.
Published: (2025)
by: Wu, Tianyuan, et al.
Published: (2025)
STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
by: Chen, Yinfang, et al.
Published: (2025)
by: Chen, Yinfang, et al.
Published: (2025)
A Unified, Practical, and Understandable Model of Non-transactional Consistency Levels in Distributed Replication
by: Hu, Guanzhou, et al.
Published: (2024)
by: Hu, Guanzhou, et al.
Published: (2024)
Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management
by: Wu, Z., et al.
Published: (2025)
by: Wu, Z., et al.
Published: (2025)
S-HPLB: Efficient LLM Attention Serving via Sparsity-Aware Head Parallelism Load Balance
by: Liu, Di, et al.
Published: (2026)
by: Liu, Di, et al.
Published: (2026)
Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis
by: Li, Yuetai, et al.
Published: (2025)
by: Li, Yuetai, et al.
Published: (2025)
SimDC: A High-Fidelity Device Simulation Platform for Device-Cloud Collaborative Computing
by: Pei, Ruiguang, et al.
Published: (2025)
by: Pei, Ruiguang, et al.
Published: (2025)
Similar Items
-
CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration
by: Nian, Sean, et al.
Published: (2026) -
Lumiere: Making Optimal BFT for Partial Synchrony Practical
by: Lewis-Pye, Andrew, et al.
Published: (2023) -
Cascadia: An Efficient Cascade Serving System for Large Language Models
by: Jiang, Youhe, et al.
Published: (2025) -
Amortized Asynchronous Byzantine Reliable Broadcast with Optimal Resilience
by: Hu, Michael Yiqing, et al.
Published: (2026) -
Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems
by: Lu, Ning, et al.
Published: (2024)