:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Jingkai, Li, Tianjian, Feng, Erhu, Du, Dong, Liu, Qian, Liu, Tao, Xia, Yubin, Chen, Haibo
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2508.18588
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025)

Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference
by: Chen, Le, et al.
Published: (2025)

Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability
by: Liu, Qingyuan, et al.
Published: (2024)

HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
by: He, Yongjun, et al.
Published: (2025)

Schedule-Level Shared-Prefix Reuse for LLM RL Training
by: Li, Pengbo, et al.
Published: (2026)

HiRL: Hierarchical Reinforcement Learning for Coordinated Resource Management in Heterogeneous Edge Computing
by: Zhu, Jianyong, et al.
Published: (2026)

LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training
by: Wang, Zhixin, et al.
Published: (2025)

Accelerating Compound LLM Training Workloads with Maestro
by: Yuan, Xiulong, et al.
Published: (2026)

FairBatching: Fairness-Aware Batch Formation for LLM Inference
by: Lyu, Hongtao, et al.
Published: (2025)

PICO: Accelerating All k-Core Paradigms on GPU
by: Zhao, Chen, et al.
Published: (2024)

PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving
by: Huang, Weizhe, et al.
Published: (2025)

Polar: Agentic RL on Any Harness at Scale
by: Xu, Binfeng, et al.
Published: (2026)

Xorbits: Automating Operator Tiling for Distributed Data Science
by: Lu, Weizheng, et al.
Published: (2023)

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)

OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration
by: Jiang, Youhe, et al.
Published: (2026)

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

Minions: Accelerating Large Language Model Inference with Aggregated Speculative Execution
by: Wang, Siqi, et al.
Published: (2024)

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)

OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency
by: Wang, Jun, et al.
Published: (2025)

A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
by: He, Jinghai, et al.
Published: (2024)

DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
by: Xue, Zhenliang, et al.
Published: (2025)

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR
by: Zhang, Yiqi, et al.
Published: (2026)

Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
by: Li, Zhe, et al.
Published: (2024)

Towards Lock Modularization for Heterogeneous Environments
by: Zhang, Hanze, et al.
Published: (2025)

Parallel Collaborative ADMM Privacy Computing and Adaptive GPU Acceleration for Distributed Edge Networks
by: Xia, Mengchun, et al.
Published: (2026)

Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)

Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks
by: Ma, Mulei, et al.
Published: (2025)

WWW.Serve: Interconnecting Global LLM Services through Decentralization
by: Wang, Huanyu, et al.
Published: (2026)

HLoRA: Efficient Federated Learning System for LLM Heterogeneous Fine-Tuning
by: Liu, Qianli, et al.
Published: (2025)

LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing
by: Guo, Hao, et al.
Published: (2026)

MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025)

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)

LAAFD: LLM-based Agents for Accelerated FPGA Design
by: Moraru, Maxim, et al.
Published: (2026)

Jenga: Effective Memory Management for Serving LLM with Heterogeneity
by: Zhang, Chen, et al.
Published: (2025)

AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality
by: Bournias, Ilias, et al.
Published: (2024)

exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)

A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs
by: Kolker-Hicks, Elliot, et al.
Published: (2024)

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
by: Liu, Zhibang, et al.
Published: (2025)

DynaShard: Secure and Adaptive Blockchain Sharding Protocol with Hybrid Consensus and Dynamic Shard Management
by: Liu, Ao, et al.
Published: (2024)