Saved in:
| Main Authors: | He, Jingkai, Li, Tianjian, Feng, Erhu, Du, Dong, Liu, Qian, Liu, Tao, Xia, Yubin, Chen, Haibo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.18588 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025)
by: Feng, Dahu, et al.
Published: (2025)
Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference
by: Chen, Le, et al.
Published: (2025)
by: Chen, Le, et al.
Published: (2025)
Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability
by: Liu, Qingyuan, et al.
Published: (2024)
by: Liu, Qingyuan, et al.
Published: (2024)
HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
by: He, Yongjun, et al.
Published: (2025)
by: He, Yongjun, et al.
Published: (2025)
Schedule-Level Shared-Prefix Reuse for LLM RL Training
by: Li, Pengbo, et al.
Published: (2026)
by: Li, Pengbo, et al.
Published: (2026)
HiRL: Hierarchical Reinforcement Learning for Coordinated Resource Management in Heterogeneous Edge Computing
by: Zhu, Jianyong, et al.
Published: (2026)
by: Zhu, Jianyong, et al.
Published: (2026)
LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training
by: Wang, Zhixin, et al.
Published: (2025)
by: Wang, Zhixin, et al.
Published: (2025)
Accelerating Compound LLM Training Workloads with Maestro
by: Yuan, Xiulong, et al.
Published: (2026)
by: Yuan, Xiulong, et al.
Published: (2026)
FairBatching: Fairness-Aware Batch Formation for LLM Inference
by: Lyu, Hongtao, et al.
Published: (2025)
by: Lyu, Hongtao, et al.
Published: (2025)
PICO: Accelerating All k-Core Paradigms on GPU
by: Zhao, Chen, et al.
Published: (2024)
by: Zhao, Chen, et al.
Published: (2024)
PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving
by: Huang, Weizhe, et al.
Published: (2025)
by: Huang, Weizhe, et al.
Published: (2025)
Polar: Agentic RL on Any Harness at Scale
by: Xu, Binfeng, et al.
Published: (2026)
by: Xu, Binfeng, et al.
Published: (2026)
Xorbits: Automating Operator Tiling for Distributed Data Science
by: Lu, Weizheng, et al.
Published: (2023)
by: Lu, Weizheng, et al.
Published: (2023)
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
OServe: Accelerating LLM Serving via Spatial-Temporal Workload Orchestration
by: Jiang, Youhe, et al.
Published: (2026)
by: Jiang, Youhe, et al.
Published: (2026)
inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Minions: Accelerating Large Language Model Inference with Aggregated Speculative Execution
by: Wang, Siqi, et al.
Published: (2024)
by: Wang, Siqi, et al.
Published: (2024)
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency
by: Wang, Jun, et al.
Published: (2025)
by: Wang, Jun, et al.
Published: (2025)
A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
by: He, Jinghai, et al.
Published: (2024)
by: He, Jinghai, et al.
Published: (2024)
DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
by: Xue, Zhenliang, et al.
Published: (2025)
by: Xue, Zhenliang, et al.
Published: (2025)
PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
by: Li, Zhe, et al.
Published: (2024)
by: Li, Zhe, et al.
Published: (2024)
Towards Lock Modularization for Heterogeneous Environments
by: Zhang, Hanze, et al.
Published: (2025)
by: Zhang, Hanze, et al.
Published: (2025)
Parallel Collaborative ADMM Privacy Computing and Adaptive GPU Acceleration for Distributed Edge Networks
by: Xia, Mengchun, et al.
Published: (2026)
by: Xia, Mengchun, et al.
Published: (2026)
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks
by: Ma, Mulei, et al.
Published: (2025)
by: Ma, Mulei, et al.
Published: (2025)
WWW.Serve: Interconnecting Global LLM Services through Decentralization
by: Wang, Huanyu, et al.
Published: (2026)
by: Wang, Huanyu, et al.
Published: (2026)
HLoRA: Efficient Federated Learning System for LLM Heterogeneous Fine-Tuning
by: Liu, Qianli, et al.
Published: (2025)
by: Liu, Qianli, et al.
Published: (2025)
LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing
by: Guo, Hao, et al.
Published: (2026)
by: Guo, Hao, et al.
Published: (2026)
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)
by: Arya, Mayank, et al.
Published: (2025)
LAAFD: LLM-based Agents for Accelerated FPGA Design
by: Moraru, Maxim, et al.
Published: (2026)
by: Moraru, Maxim, et al.
Published: (2026)
Jenga: Effective Memory Management for Serving LLM with Heterogeneity
by: Zhang, Chen, et al.
Published: (2025)
by: Zhang, Chen, et al.
Published: (2025)
AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality
by: Bournias, Ilias, et al.
Published: (2024)
by: Bournias, Ilias, et al.
Published: (2024)
exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)
by: Moraru, Maxim, et al.
Published: (2025)
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs
by: Kolker-Hicks, Elliot, et al.
Published: (2024)
by: Kolker-Hicks, Elliot, et al.
Published: (2024)
Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
by: Liu, Zhibang, et al.
Published: (2025)
by: Liu, Zhibang, et al.
Published: (2025)
DynaShard: Secure and Adaptive Blockchain Sharding Protocol with Hybrid Consensus and Dynamic Shard Management
by: Liu, Ao, et al.
Published: (2024)
by: Liu, Ao, et al.
Published: (2024)
Similar Items
-
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025) -
Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference
by: Chen, Le, et al.
Published: (2025) -
Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability
by: Liu, Qingyuan, et al.
Published: (2024) -
HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
by: He, Yongjun, et al.
Published: (2025) -
Schedule-Level Shared-Prefix Reuse for LLM RL Training
by: Li, Pengbo, et al.
Published: (2026)