Saved in:
| Main Authors: | Xin, Jihao, Lyu, Tian, Pan, Qilong, Wang, Kesen, Canini, Marco |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.09595 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees
by: Xin, Jihao, et al.
Published: (2023)
by: Xin, Jihao, et al.
Published: (2023)
FilFL: Client Filtering for Optimized Client Participation in Federated Learning
by: Fourati, Fares, et al.
Published: (2023)
by: Fourati, Fares, et al.
Published: (2023)
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training
by: Liu, Man, et al.
Published: (2026)
by: Liu, Man, et al.
Published: (2026)
Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs
by: Yu, Guangba, et al.
Published: (2026)
by: Yu, Guangba, et al.
Published: (2026)
NeurLZ: An Online Neural Learning-Based Method to Enhance Scientific Lossy Compression
by: Jia, Wenqi, et al.
Published: (2024)
by: Jia, Wenqi, et al.
Published: (2024)
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol
by: Yang, Ningyuan, et al.
Published: (2025)
by: Yang, Ningyuan, et al.
Published: (2025)
ModTrans: Translating Real-world Models for Distributed Training Simulator
by: Lyu, Yi
Published: (2026)
by: Lyu, Yi
Published: (2026)
Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression
by: Sung, Mingyu, et al.
Published: (2025)
by: Sung, Mingyu, et al.
Published: (2025)
UCCL-Zip: Lossless Compression Supercharged GPU Communication
by: Ma, Shuang, et al.
Published: (2026)
by: Ma, Shuang, et al.
Published: (2026)
Flashback: Understanding and Mitigating Forgetting in Federated Learning
by: Aljahdali, Mohammed, et al.
Published: (2024)
by: Aljahdali, Mohammed, et al.
Published: (2024)
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
by: Wang, Li, et al.
Published: (2024)
by: Wang, Li, et al.
Published: (2024)
Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
by: Morabito, Roberto, et al.
Published: (2025)
by: Morabito, Roberto, et al.
Published: (2025)
Training LLMs with Fault Tolerant HSDP on 100,000 GPUs
by: Salpekar, Omkar, et al.
Published: (2026)
by: Salpekar, Omkar, et al.
Published: (2026)
Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
by: Kim, Taeyoon, et al.
Published: (2026)
by: Kim, Taeyoon, et al.
Published: (2026)
GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
by: Jia, Wenqi, et al.
Published: (2024)
by: Jia, Wenqi, et al.
Published: (2024)
ELANA: A Simple Energy and Latency Analyzer for LLMs
by: Chiang, Hung-Yueh, et al.
Published: (2025)
by: Chiang, Hung-Yueh, et al.
Published: (2025)
SMART: When is it Actually Worth Expanding a Speculative Tree?
by: Wang, Lifu, et al.
Published: (2026)
by: Wang, Lifu, et al.
Published: (2026)
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
by: Wang, Yisu, et al.
Published: (2025)
by: Wang, Yisu, et al.
Published: (2025)
Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention
by: Liao, Mengqi, et al.
Published: (2026)
by: Liao, Mengqi, et al.
Published: (2026)
Accelerating Large Language Model Training with Hybrid GPU-based Compression
by: Xu, Lang, et al.
Published: (2024)
by: Xu, Lang, et al.
Published: (2024)
Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding
by: Li, Chengxi, et al.
Published: (2026)
by: Li, Chengxi, et al.
Published: (2026)
FairBatching: Fairness-Aware Batch Formation for LLM Inference
by: Lyu, Hongtao, et al.
Published: (2025)
by: Lyu, Hongtao, et al.
Published: (2025)
KVComp: A High-Performance, LLM-Aware, Lossy Compression Framework for KV Cache
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
A Model Aware AIGC Task Offloading Algorithm in IIoT Edge Computing
by: Wang, Xin, et al.
Published: (2025)
by: Wang, Xin, et al.
Published: (2025)
PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
An Evaluation of LLMs Inference on Popular Single-board Computers
by: Tung, et al.
Published: (2025)
by: Tung, et al.
Published: (2025)
Balanced and Elastic End-to-end Training of Dynamic LLMs
by: Wahib, Mohamed, et al.
Published: (2025)
by: Wahib, Mohamed, et al.
Published: (2025)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
by: Tang, Zhenheng, et al.
Published: (2024)
by: Tang, Zhenheng, et al.
Published: (2024)
FlashRecovery: Fast and Low-Cost Recovery from Failures for Large-Scale Training of LLMs
by: Zhang, Haijun, et al.
Published: (2025)
by: Zhang, Haijun, et al.
Published: (2025)
High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments
by: Rodriguez, Julian, et al.
Published: (2025)
by: Rodriguez, Julian, et al.
Published: (2025)
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
by: Fang, Jiarui, et al.
Published: (2024)
by: Fang, Jiarui, et al.
Published: (2024)
Marconi: Prefix Caching for the Era of Hybrid LLMs
by: Pan, Rui, et al.
Published: (2024)
by: Pan, Rui, et al.
Published: (2024)
SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
by: Tschand, Arya, et al.
Published: (2025)
by: Tschand, Arya, et al.
Published: (2025)
Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025)
by: Pan, Xinglin, et al.
Published: (2025)
Design a Win-Win Strategy That Is Fair to Both Service Providers and Tasks When Rejection Is Not an Option
by: Trabelsi, Yohai, et al.
Published: (2024)
by: Trabelsi, Yohai, et al.
Published: (2024)
Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems
by: Yang, Haowei, et al.
Published: (2025)
by: Yang, Haowei, et al.
Published: (2025)
ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management
by: Pan, Zaifeng, et al.
Published: (2026)
by: Pan, Zaifeng, et al.
Published: (2026)
FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)
by: Zhang, Jiashu, et al.
Published: (2024)
ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC System
by: Sun, Yongqian, et al.
Published: (2025)
by: Sun, Yongqian, et al.
Published: (2025)
Domain-Adaptive Model Merging Across Disconnected Modes
by: Liu, Junming, et al.
Published: (2026)
by: Liu, Junming, et al.
Published: (2026)
Similar Items
-
Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees
by: Xin, Jihao, et al.
Published: (2023) -
FilFL: Client Filtering for Optimized Client Participation in Federated Learning
by: Fourati, Fares, et al.
Published: (2023) -
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training
by: Liu, Man, et al.
Published: (2026) -
Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs
by: Yu, Guangba, et al.
Published: (2026) -
NeurLZ: An Online Neural Learning-Based Method to Enhance Scientific Lossy Compression
by: Jia, Wenqi, et al.
Published: (2024)