Saved in:
| Main Authors: | Deng, Ziheng, Liu, Xue, Jiang, Jiantong, Li, Yankai, Deng, Qingxu, Yang, Xiaochun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.19301 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems
by: Jiang, Yankai, et al.
Published: (2025)
by: Jiang, Yankai, et al.
Published: (2025)
SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
by: Zhang, Ziyi, et al.
Published: (2025)
by: Zhang, Ziyi, et al.
Published: (2025)
EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing
by: Jiang, Yankai, et al.
Published: (2024)
by: Jiang, Yankai, et al.
Published: (2024)
STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
by: Chen, Yinfang, et al.
Published: (2025)
by: Chen, Yinfang, et al.
Published: (2025)
Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
by: Deng, Xiaoge, et al.
Published: (2021)
by: Deng, Xiaoge, et al.
Published: (2021)
Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism
by: Wei, Jinhui, et al.
Published: (2025)
by: Wei, Jinhui, et al.
Published: (2025)
WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing
by: Jiang, Yankai, et al.
Published: (2025)
by: Jiang, Yankai, et al.
Published: (2025)
Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems
by: Liu, Guowei, et al.
Published: (2026)
by: Liu, Guowei, et al.
Published: (2026)
DCSim: Computing and Networking Integration based Container Scheduling Simulator for Data Centers
by: Hu, Jinlong, et al.
Published: (2024)
by: Hu, Jinlong, et al.
Published: (2024)
Energy-Efficient Wireless Federated Learning via Doubly Adaptive Quantization
by: Han, Xuefeng, et al.
Published: (2024)
by: Han, Xuefeng, et al.
Published: (2024)
LLM Inference Serving: Survey of Recent Advances and Opportunities
by: Li, Baolin, et al.
Published: (2024)
by: Li, Baolin, et al.
Published: (2024)
GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference
by: Tran, Phuong, et al.
Published: (2025)
by: Tran, Phuong, et al.
Published: (2025)
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
FLASH: Federated Learning Across Simultaneous Heterogeneities
by: Chang, Xiangyu, et al.
Published: (2024)
by: Chang, Xiangyu, et al.
Published: (2024)
Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions
by: Zhao, Hailiang, et al.
Published: (2024)
by: Zhao, Hailiang, et al.
Published: (2024)
NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding
by: Chen, Jiefei, et al.
Published: (2026)
by: Chen, Jiefei, et al.
Published: (2026)
ML-based Adaptive Prefetching and Data Placement for US HEP Systems
by: Karanam, Venkat Sai Suman Lamba, et al.
Published: (2025)
by: Karanam, Venkat Sai Suman Lamba, et al.
Published: (2025)
INSPIRIT: Optimizing Heterogeneous Task Scheduling through Adaptive Priority in Task-based Runtime Systems
by: Wang, Yiqing, et al.
Published: (2024)
by: Wang, Yiqing, et al.
Published: (2024)
CXL Shared Memory Programming: Barely Distributed and Almost Persistent
by: Xu, Yi, et al.
Published: (2024)
by: Xu, Yi, et al.
Published: (2024)
Demystifying ARM SME to Optimize General Matrix Multiplications
by: Deng, Chencheng, et al.
Published: (2025)
by: Deng, Chencheng, et al.
Published: (2025)
FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
by: Wu, Hao, et al.
Published: (2024)
by: Wu, Hao, et al.
Published: (2024)
Hamster: A Fast Synchronous Byzantine Fault Tolerance Protocol
by: Fu, Ximing, et al.
Published: (2024)
by: Fu, Ximing, et al.
Published: (2024)
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)
by: Li, Zixuan, et al.
Published: (2024)
MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs
by: Zhang, Jiyuan, et al.
Published: (2026)
by: Zhang, Jiyuan, et al.
Published: (2026)
SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions
by: Liao, Gang, et al.
Published: (2024)
by: Liao, Gang, et al.
Published: (2024)
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
by: Chang, Li-Wen, et al.
Published: (2024)
by: Chang, Li-Wen, et al.
Published: (2024)
MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)
by: Tang, Xinru, et al.
Published: (2025)
Efficient Fault Tolerance for Pipelined Query Engines via Write-ahead Lineage
by: Wang, Ziheng, et al.
Published: (2024)
by: Wang, Ziheng, et al.
Published: (2024)
Next-Gen Computing Systems with Compute Express Link: a Comprehensive Survey
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
FuxiShuffle: An Adaptive and Resilient Shuffle Service for Distributed Data Processing on Alibaba Cloud
by: Lin, Yuhao, et al.
Published: (2026)
by: Lin, Yuhao, et al.
Published: (2026)
Modern Computing: Vision and Challenges
by: Gill, Sukhpal Singh, et al.
Published: (2024)
by: Gill, Sukhpal Singh, et al.
Published: (2024)
Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things
by: Wang, Ziheng, et al.
Published: (2024)
by: Wang, Ziheng, et al.
Published: (2024)
Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers
by: Hua, Qin, et al.
Published: (2024)
by: Hua, Qin, et al.
Published: (2024)
Autonomous Resource Management in Microservice Systems via Reinforcement Learning
by: Zou, Yujun, et al.
Published: (2025)
by: Zou, Yujun, et al.
Published: (2025)
GICC: A High-Performance Runtime for GPU-Initiated Communication and Coordination in Modern HPC Systems
by: Shan, Baodi, et al.
Published: (2026)
by: Shan, Baodi, et al.
Published: (2026)
Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality
by: Chen, Sirui, et al.
Published: (2025)
by: Chen, Sirui, et al.
Published: (2025)
PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling
by: Liu, Chongpeng, et al.
Published: (2025)
by: Liu, Chongpeng, et al.
Published: (2025)
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler
by: Zheng, Size, et al.
Published: (2025)
by: Zheng, Size, et al.
Published: (2025)
A Unified CPU-GPU Protocol for GNN Training
by: Lin, Yi-Chien, et al.
Published: (2024)
by: Lin, Yi-Chien, et al.
Published: (2024)
Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
by: Li, Shengwei, et al.
Published: (2023)
by: Li, Shengwei, et al.
Published: (2023)
Similar Items
-
ThirstyFLOPS: Water Footprint Modeling and Analysis Toward Sustainable HPC Systems
by: Jiang, Yankai, et al.
Published: (2025) -
SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding
by: Zhang, Ziyi, et al.
Published: (2025) -
EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing
by: Jiang, Yankai, et al.
Published: (2024) -
STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
by: Chen, Yinfang, et al.
Published: (2025) -
Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
by: Deng, Xiaoge, et al.
Published: (2021)