:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xin, Jihao, Lyu, Tian, Pan, Qilong, Wang, Kesen, Canini, Marco
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.09595
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees
by: Xin, Jihao, et al.
Published: (2023)

FilFL: Client Filtering for Optimized Client Participation in Federated Learning
by: Fourati, Fares, et al.
Published: (2023)

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training
by: Liu, Man, et al.
Published: (2026)

Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs
by: Yu, Guangba, et al.
Published: (2026)

NeurLZ: An Online Neural Learning-Based Method to Enhance Scientific Lossy Compression
by: Jia, Wenqi, et al.
Published: (2024)

IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol
by: Yang, Ningyuan, et al.
Published: (2025)

ModTrans: Translating Real-world Models for Distributed Training Simulator
by: Lyu, Yi
Published: (2026)

Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression
by: Sung, Mingyu, et al.
Published: (2025)

UCCL-Zip: Lossless Compression Supercharged GPU Communication
by: Ma, Shuang, et al.
Published: (2026)

Flashback: Understanding and Mitigating Forgetting in Federated Learning
by: Aljahdali, Mohammed, et al.
Published: (2024)

Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
by: Wang, Li, et al.
Published: (2024)

Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
by: Morabito, Roberto, et al.
Published: (2025)

Training LLMs with Fault Tolerant HSDP on 100,000 GPUs
by: Salpekar, Omkar, et al.
Published: (2026)

Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
by: Kim, Taeyoon, et al.
Published: (2026)

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
by: Jia, Wenqi, et al.
Published: (2024)

ELANA: A Simple Energy and Latency Analyzer for LLMs
by: Chiang, Hung-Yueh, et al.
Published: (2025)

SMART: When is it Actually Worth Expanding a Speculative Tree?
by: Wang, Lifu, et al.
Published: (2026)

PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
by: Wang, Yisu, et al.
Published: (2025)

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention
by: Liao, Mengqi, et al.
Published: (2026)

Accelerating Large Language Model Training with Hybrid GPU-based Compression
by: Xu, Lang, et al.
Published: (2024)

Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding
by: Li, Chengxi, et al.
Published: (2026)

FairBatching: Fairness-Aware Batch Formation for LLM Inference
by: Lyu, Hongtao, et al.
Published: (2025)

KVComp: A High-Performance, LLM-Aware, Lossy Compression Framework for KV Cache
by: Jiang, Bo, et al.
Published: (2025)

A Model Aware AIGC Task Offloading Algorithm in IIoT Edge Computing
by: Wang, Xin, et al.
Published: (2025)

PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression
by: Jiang, Bo, et al.
Published: (2025)

An Evaluation of LLMs Inference on Popular Single-board Computers
by: Tung, et al.
Published: (2025)

Balanced and Elastic End-to-end Training of Dynamic LLMs
by: Wahib, Mohamed, et al.
Published: (2025)

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
by: Tang, Zhenheng, et al.
Published: (2024)

FlashRecovery: Fast and Low-Cost Recovery from Failures for Large-Scale Training of LLMs
by: Zhang, Haijun, et al.
Published: (2025)

High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments
by: Rodriguez, Julian, et al.
Published: (2025)

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
by: Fang, Jiarui, et al.
Published: (2024)

Marconi: Prefix Caching for the Era of Hybrid LLMs
by: Pan, Rui, et al.
Published: (2024)

SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
by: Tschand, Arya, et al.
Published: (2025)

Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025)

Design a Win-Win Strategy That Is Fair to Both Service Providers and Tasks When Rejection Is Not an Option
by: Trabelsi, Yohai, et al.
Published: (2024)

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems
by: Yang, Haowei, et al.
Published: (2025)

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management
by: Pan, Zaifeng, et al.
Published: (2026)

FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)

ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC System
by: Sun, Yongqian, et al.
Published: (2025)

Domain-Adaptive Model Merging Across Disconnected Modes
by: Liu, Junming, et al.
Published: (2026)