Saved in:
| Main Authors: | Makwana, Darshan, Jogi, Yash, Kotta, Harsh, Kubba, Aayush |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.11273 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks
by: Makwana, Darshan
Published: (2025)
by: Makwana, Darshan
Published: (2025)
Improving Rare-Word Recognition of Whisper in Zero-Shot Settings
by: Jogi, Yash, et al.
Published: (2025)
by: Jogi, Yash, et al.
Published: (2025)
Adopting Whisper for Confidence Estimation
by: Aggarwal, Vaibhav, et al.
Published: (2025)
by: Aggarwal, Vaibhav, et al.
Published: (2025)
MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025)
by: Su, Zhaoyuan, et al.
Published: (2025)
SALSA: Speedy ASR-LLM Synchronous Aggregation
by: Mittal, Ashish, et al.
Published: (2024)
by: Mittal, Ashish, et al.
Published: (2024)
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
by: Siavashi, Mohammad, et al.
Published: (2025)
by: Siavashi, Mohammad, et al.
Published: (2025)
Speech Diarization and ASR with GMM
by: Sharma, Aayush Kumar, et al.
Published: (2023)
by: Sharma, Aayush Kumar, et al.
Published: (2023)
Prompt-Aware Scheduling for Low-Latency LLM Serving
by: Tao, Yiheng, et al.
Published: (2025)
by: Tao, Yiheng, et al.
Published: (2025)
FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
by: Bin, Kyungmin, et al.
Published: (2025)
by: Bin, Kyungmin, et al.
Published: (2025)
Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning
by: Wang, Tongxi, et al.
Published: (2026)
by: Wang, Tongxi, et al.
Published: (2026)
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
by: Wang, Zhengchao, et al.
Published: (2025)
by: Wang, Zhengchao, et al.
Published: (2025)
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
by: Gao, Shihong, et al.
Published: (2025)
by: Gao, Shihong, et al.
Published: (2025)
Dislocation cartography: Representations and unsupervised classification of dislocation networks with unique fingerprints
by: Udofia, Benjamin, et al.
Published: (2024)
by: Udofia, Benjamin, et al.
Published: (2024)
Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads
by: Song, Chendong, et al.
Published: (2026)
by: Song, Chendong, et al.
Published: (2026)
PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models
by: Cho, Eunyeong, et al.
Published: (2026)
by: Cho, Eunyeong, et al.
Published: (2026)
FedTeddi: Temporal Drift and Divergence Aware Scheduling for Timely Federated Edge Learning
by: Bai, Yuxuan, et al.
Published: (2025)
by: Bai, Yuxuan, et al.
Published: (2025)
ViBE: Co-Optimizing Workload Skew and Hardware Variability for MoE Serving
by: Go, Seokjin, et al.
Published: (2026)
by: Go, Seokjin, et al.
Published: (2026)
Duration-Informed Workload Scheduler
by: Loreti, Daniela, et al.
Published: (2026)
by: Loreti, Daniela, et al.
Published: (2026)
Counterfactual Explanations Under Concept Drift
by: Kostrzewa, Marcin, et al.
Published: (2026)
by: Kostrzewa, Marcin, et al.
Published: (2026)
Domain Generalization Under Posterior Drift
by: Zhu, Yilun, et al.
Published: (2025)
by: Zhu, Yilun, et al.
Published: (2025)
Locality-aware Fair Scheduling in LLM Serving
by: Cao, Shiyi, et al.
Published: (2025)
by: Cao, Shiyi, et al.
Published: (2025)
Fairness Under Group-Conditional Prior Probability Shift: Invariance, Drift, and Target-Aware Post-Processing
by: Asiaee, Amir, et al.
Published: (2026)
by: Asiaee, Amir, et al.
Published: (2026)
Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators
by: Shukla, Arnav, et al.
Published: (2025)
by: Shukla, Arnav, et al.
Published: (2025)
Optimal Linear Decay Learning Rate Schedules and Further Refinements
by: Defazio, Aaron, et al.
Published: (2023)
by: Defazio, Aaron, et al.
Published: (2023)
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
by: Srivatsa, Vikranth, et al.
Published: (2024)
by: Srivatsa, Vikranth, et al.
Published: (2024)
PRISM: Fast Online LLM Serving via Scheduling-Memory Co-design
by: Qu, Xingyu, et al.
Published: (2026)
by: Qu, Xingyu, et al.
Published: (2026)
Privacy Drift: Evolving Privacy Concerns in Incremental Learning
by: Ahamed, Sayyed Farid, et al.
Published: (2024)
by: Ahamed, Sayyed Farid, et al.
Published: (2024)
Assessing the Impact of Upselling in Online Fantasy Sports
by: Chaudhary, Aayush
Published: (2024)
by: Chaudhary, Aayush
Published: (2024)
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
by: Luo, Ziyue, et al.
Published: (2025)
by: Luo, Ziyue, et al.
Published: (2025)
Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC
by: Wei, Xinming, et al.
Published: (2025)
by: Wei, Xinming, et al.
Published: (2025)
The Road Less Scheduled
by: Defazio, Aaron, et al.
Published: (2024)
by: Defazio, Aaron, et al.
Published: (2024)
Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Llumnix: Dynamic Scheduling for Large Language Model Serving
by: Sun, Biao, et al.
Published: (2024)
by: Sun, Biao, et al.
Published: (2024)
The impact of allocation strategies in subset learning on the expressive power of neural networks
by: Schlisselberg, Ofir, et al.
Published: (2025)
by: Schlisselberg, Ofir, et al.
Published: (2025)
Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling
by: Bian, Yiming, et al.
Published: (2026)
by: Bian, Yiming, et al.
Published: (2026)
An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing
by: Dong, Hang, et al.
Published: (2024)
by: Dong, Hang, et al.
Published: (2024)
Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences
by: Dube, Rohit, et al.
Published: (2025)
by: Dube, Rohit, et al.
Published: (2025)
Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
by: Chen, Lequn, et al.
Published: (2023)
by: Chen, Lequn, et al.
Published: (2023)
Online Detection of Water Contamination Under Concept Drift
by: Li, Jin, et al.
Published: (2025)
by: Li, Jin, et al.
Published: (2025)
Signal-Aware Workload Shifting Algorithms with Uncertainty-Quantified Predictors
by: Johnson, Ezra, et al.
Published: (2025)
by: Johnson, Ezra, et al.
Published: (2025)
Similar Items
-
Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks
by: Makwana, Darshan
Published: (2025) -
Improving Rare-Word Recognition of Whisper in Zero-Shot Settings
by: Jogi, Yash, et al.
Published: (2025) -
Adopting Whisper for Confidence Estimation
by: Aggarwal, Vaibhav, et al.
Published: (2025) -
MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025) -
SALSA: Speedy ASR-LLM Synchronous Aggregation
by: Mittal, Ashish, et al.
Published: (2024)