Saved in:
| Main Authors: | Harshbarger, Ian, Chidambaram, Calvin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.09018 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System
by: Agarwal, Shubham, et al.
Published: (2025)
by: Agarwal, Shubham, et al.
Published: (2025)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems
by: Chen, Yulang, et al.
Published: (2026)
by: Chen, Yulang, et al.
Published: (2026)
ML Inference Scheduling with Predictable Latency
by: Zhao, Haidong, et al.
Published: (2025)
by: Zhao, Haidong, et al.
Published: (2025)
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
by: Oh, Hyungjun, et al.
Published: (2024)
by: Oh, Hyungjun, et al.
Published: (2024)
Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach
by: Vora, Manav, et al.
Published: (2024)
by: Vora, Manav, et al.
Published: (2024)
Semantic Scheduling for LLM Inference
by: Hua, Wenyue, et al.
Published: (2025)
by: Hua, Wenyue, et al.
Published: (2025)
Optimal Inference Schedules for Masked Diffusion Models
by: Chen, Sitan, et al.
Published: (2025)
by: Chen, Sitan, et al.
Published: (2025)
PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
by: Zhang, Zeyu, et al.
Published: (2024)
by: Zhang, Zeyu, et al.
Published: (2024)
CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC
by: Alqudah, Eman, et al.
Published: (2025)
by: Alqudah, Eman, et al.
Published: (2025)
Efficient LLM Scheduling by Learning to Rank
by: Fu, Yichao, et al.
Published: (2024)
by: Fu, Yichao, et al.
Published: (2024)
A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference
by: Puigdemont, Pol, et al.
Published: (2024)
by: Puigdemont, Pol, et al.
Published: (2024)
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
by: Siavashi, Mohammad, et al.
Published: (2025)
by: Siavashi, Mohammad, et al.
Published: (2025)
Mamba Meets Scheduling: Learning to Solve Flexible Job Shop Scheduling with Efficient Sequence Modeling
by: Cao, Zhi, et al.
Published: (2026)
by: Cao, Zhi, et al.
Published: (2026)
Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
by: Wu, Yanru, et al.
Published: (2026)
by: Wu, Yanru, et al.
Published: (2026)
Energy-Efficient Scheduling with Predictions
by: Balkanski, Eric, et al.
Published: (2024)
by: Balkanski, Eric, et al.
Published: (2024)
Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees
by: Dong, Zhuolun, et al.
Published: (2026)
by: Dong, Zhuolun, et al.
Published: (2026)
Mining--Gym: A Configurable RL Benchmarking Environment for Truck Dispatch Scheduling
by: Banerjee, Chayan, et al.
Published: (2025)
by: Banerjee, Chayan, et al.
Published: (2025)
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
by: Dai, J. G., et al.
Published: (2025)
by: Dai, J. G., et al.
Published: (2025)
Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission
by: Tang, Zengzipeng, et al.
Published: (2026)
by: Tang, Zengzipeng, et al.
Published: (2026)
Duration Aware Scheduling for ASR Serving Under Workload Drift
by: Makwana, Darshan, et al.
Published: (2026)
by: Makwana, Darshan, et al.
Published: (2026)
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling
by: Li, Derek, et al.
Published: (2025)
by: Li, Derek, et al.
Published: (2025)
CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning
by: He, Guoliang, et al.
Published: (2025)
by: He, Guoliang, et al.
Published: (2025)
RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach
by: Liu, Yingzhuo
Published: (2024)
by: Liu, Yingzhuo
Published: (2024)
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips
by: Yu, Jiahuan, et al.
Published: (2026)
by: Yu, Jiahuan, et al.
Published: (2026)
Online Scheduling for LLM Inference with KV Cache Constraints
by: Jaillet, Patrick, et al.
Published: (2025)
by: Jaillet, Patrick, et al.
Published: (2025)
Optimal Scheduling Algorithms for LLM Inference: Theory and Practice
by: Bari, Agrim, et al.
Published: (2025)
by: Bari, Agrim, et al.
Published: (2025)
LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers
by: Yu, Enda, et al.
Published: (2025)
by: Yu, Enda, et al.
Published: (2025)
Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling
by: Zhang, Xianzhi, et al.
Published: (2026)
by: Zhang, Xianzhi, et al.
Published: (2026)
Adaptive Slimming for Scalable and Efficient Speech Enhancement
by: Miccini, Riccardo, et al.
Published: (2025)
by: Miccini, Riccardo, et al.
Published: (2025)
SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling
by: Wang, Jinghui, et al.
Published: (2025)
by: Wang, Jinghui, et al.
Published: (2025)
Design and Scheduling of an AI-based Queueing System
by: Lee, Jiung, et al.
Published: (2024)
by: Lee, Jiung, et al.
Published: (2024)
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
by: Gao, Shihong, et al.
Published: (2025)
by: Gao, Shihong, et al.
Published: (2025)
Green Federated Learning via Carbon-Aware Client and Time Slot Scheduling
by: Arputharaj, Daniel Richards, et al.
Published: (2025)
by: Arputharaj, Daniel Richards, et al.
Published: (2025)
Score-Optimal Diffusion Schedules
by: Williams, Christopher, et al.
Published: (2024)
by: Williams, Christopher, et al.
Published: (2024)
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems
by: Wang, Yifei, et al.
Published: (2026)
by: Wang, Yifei, et al.
Published: (2026)
LLM-Guided Runtime Parameter Optimization for Energy-Efficient Model Inference
by: Crumpacker, Katelyn, et al.
Published: (2026)
by: Crumpacker, Katelyn, et al.
Published: (2026)
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
by: Zhong, Shuzhang, et al.
Published: (2025)
by: Zhong, Shuzhang, et al.
Published: (2025)
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
by: Choi, Seungbeom, et al.
Published: (2025)
by: Choi, Seungbeom, et al.
Published: (2025)
Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters
by: Moore, Hayden, et al.
Published: (2025)
by: Moore, Hayden, et al.
Published: (2025)
Similar Items
-
Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System
by: Agarwal, Shubham, et al.
Published: (2025) -
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026) -
AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems
by: Chen, Yulang, et al.
Published: (2026) -
ML Inference Scheduling with Predictable Latency
by: Zhao, Haidong, et al.
Published: (2025) -
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
by: Oh, Hyungjun, et al.
Published: (2024)