:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Makwana, Darshan, Jogi, Yash, Kotta, Harsh, Kubba, Aayush
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.11273
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks
by: Makwana, Darshan
Published: (2025)

Improving Rare-Word Recognition of Whisper in Zero-Shot Settings
by: Jogi, Yash, et al.
Published: (2025)

Adopting Whisper for Confidence Estimation
by: Aggarwal, Vaibhav, et al.
Published: (2025)

MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025)

SALSA: Speedy ASR-LLM Synchronous Aggregation
by: Mittal, Ashish, et al.
Published: (2024)

Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
by: Siavashi, Mohammad, et al.
Published: (2025)

Speech Diarization and ASR with GMM
by: Sharma, Aayush Kumar, et al.
Published: (2023)

Prompt-Aware Scheduling for Low-Latency LLM Serving
by: Tao, Yiheng, et al.
Published: (2025)

FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
by: Bin, Kyungmin, et al.
Published: (2025)

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning
by: Wang, Tongxi, et al.
Published: (2026)

RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
by: Wang, Zhengchao, et al.
Published: (2025)

Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
by: Gao, Shihong, et al.
Published: (2025)

Dislocation cartography: Representations and unsupervised classification of dislocation networks with unique fingerprints
by: Udofia, Benjamin, et al.
Published: (2024)

Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads
by: Song, Chendong, et al.
Published: (2026)

PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models
by: Cho, Eunyeong, et al.
Published: (2026)

FedTeddi: Temporal Drift and Divergence Aware Scheduling for Timely Federated Edge Learning
by: Bai, Yuxuan, et al.
Published: (2025)

ViBE: Co-Optimizing Workload Skew and Hardware Variability for MoE Serving
by: Go, Seokjin, et al.
Published: (2026)

Duration-Informed Workload Scheduler
by: Loreti, Daniela, et al.
Published: (2026)

Counterfactual Explanations Under Concept Drift
by: Kostrzewa, Marcin, et al.
Published: (2026)

Domain Generalization Under Posterior Drift
by: Zhu, Yilun, et al.
Published: (2025)

Locality-aware Fair Scheduling in LLM Serving
by: Cao, Shiyi, et al.
Published: (2025)

Fairness Under Group-Conditional Prior Probability Shift: Invariance, Drift, and Target-Aware Post-Processing
by: Asiaee, Amir, et al.
Published: (2026)

Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators
by: Shukla, Arnav, et al.
Published: (2025)

Optimal Linear Decay Learning Rate Schedules and Further Refinements
by: Defazio, Aaron, et al.
Published: (2023)

Preble: Efficient Distributed Prompt Scheduling for LLM Serving
by: Srivatsa, Vikranth, et al.
Published: (2024)

PRISM: Fast Online LLM Serving via Scheduling-Memory Co-design
by: Qu, Xingyu, et al.
Published: (2026)

Privacy Drift: Evolving Privacy Concerns in Incremental Learning
by: Ahamed, Sayyed Farid, et al.
Published: (2024)

Assessing the Impact of Upselling in Online Fantasy Sports
by: Chaudhary, Aayush
Published: (2024)

Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
by: Luo, Ziyue, et al.
Published: (2025)

Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC
by: Wei, Xinming, et al.
Published: (2025)

The Road Less Scheduled
by: Defazio, Aaron, et al.
Published: (2024)

Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)

Llumnix: Dynamic Scheduling for Large Language Model Serving
by: Sun, Biao, et al.
Published: (2024)

The impact of allocation strategies in subset learning on the expressive power of neural networks
by: Schlisselberg, Ofir, et al.
Published: (2025)

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling
by: Bian, Yiming, et al.
Published: (2026)

An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing
by: Dong, Hang, et al.
Published: (2024)

Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences
by: Dube, Rohit, et al.
Published: (2025)

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
by: Chen, Lequn, et al.
Published: (2023)

Online Detection of Water Contamination Under Concept Drift
by: Li, Jin, et al.
Published: (2025)

Signal-Aware Workload Shifting Algorithms with Uncertainty-Quantified Predictors
by: Johnson, Ezra, et al.
Published: (2025)