:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Deng, Ziheng, Liu, Xue, Jiang, Jiantong, Li, Yankai, Deng, Qingxu, Yang, Xiaochun
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2510.19301
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems
by: Jiang, Yankai, et al.
Published: (2025)

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
by: Zhang, Ziyi, et al.
Published: (2025)

EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing
by: Jiang, Yankai, et al.
Published: (2024)

STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
by: Chen, Yinfang, et al.
Published: (2025)

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
by: Deng, Xiaoge, et al.
Published: (2021)

Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism
by: Wei, Jinhui, et al.
Published: (2025)

WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing
by: Jiang, Yankai, et al.
Published: (2025)

Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems
by: Liu, Guowei, et al.
Published: (2026)

DCSim: Computing and Networking Integration based Container Scheduling Simulator for Data Centers
by: Hu, Jinlong, et al.
Published: (2024)

Energy-Efficient Wireless Federated Learning via Doubly Adaptive Quantization
by: Han, Xuefeng, et al.
Published: (2024)

LLM Inference Serving: Survey of Recent Advances and Opportunities
by: Li, Baolin, et al.
Published: (2024)

GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference
by: Tran, Phuong, et al.
Published: (2025)

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)

FLASH: Federated Learning Across Simultaneous Heterogeneities
by: Chang, Xiangyu, et al.
Published: (2024)

Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions
by: Zhao, Hailiang, et al.
Published: (2024)

NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding
by: Chen, Jiefei, et al.
Published: (2026)

ML-based Adaptive Prefetching and Data Placement for US HEP Systems
by: Karanam, Venkat Sai Suman Lamba, et al.
Published: (2025)

INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems
by: Wang, Yiqing, et al.
Published: (2024)

CXL Shared Memory Programming: Barely Distributed and Almost Persistent
by: Xu, Yi, et al.
Published: (2024)

Demystifying ARM SME to Optimize General Matrix Multiplications
by: Deng, Chencheng, et al.
Published: (2025)

FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
by: Wu, Hao, et al.
Published: (2024)

Hamster: A Fast Synchronous Byzantine Fault Tolerance Protocol
by: Fu, Ximing, et al.
Published: (2024)

cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)

MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs
by: Zhang, Jiyuan, et al.
Published: (2026)

SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions
by: Liao, Gang, et al.
Published: (2024)

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
by: Chang, Li-Wen, et al.
Published: (2024)

MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)

Efficient Fault Tolerance for Pipelined Query Engines via Write-ahead Lineage
by: Wang, Ziheng, et al.
Published: (2024)

Next-Gen Computing Systems with Compute Express Link: a Comprehensive Survey
by: Chen, Chen, et al.
Published: (2024)

FuxiShuffle: An Adaptive and Resilient Shuffle Service for Distributed Data Processing on Alibaba Cloud
by: Lin, Yuhao, et al.
Published: (2026)

Modern Computing: Vision and Challenges
by: Gill, Sukhpal Singh, et al.
Published: (2024)

Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things
by: Wang, Ziheng, et al.
Published: (2024)

Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers
by: Hua, Qin, et al.
Published: (2024)

Autonomous Resource Management in Microservice Systems via Reinforcement Learning
by: Zou, Yujun, et al.
Published: (2025)

GICC: A High-Performance Runtime for GPU-Initiated Communication and Coordination in Modern HPC Systems
by: Shan, Baodi, et al.
Published: (2026)

Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality
by: Chen, Sirui, et al.
Published: (2025)

PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling
by: Liu, Chongpeng, et al.
Published: (2025)

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler
by: Zheng, Size, et al.
Published: (2025)

A Unified CPU-GPU Protocol for GNN Training
by: Lin, Yi-Chien, et al.
Published: (2024)

Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
by: Li, Shengwei, et al.
Published: (2023)