:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Xuan, Mo, yue, Zhang, Weigang, Wu
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Distributed, Parallel, and Cluster Computing
Accesso online:	https://arxiv.org/abs/2509.06362
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching
di: Peng, Haosong, et al.
Pubblicazione: (2024)

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
di: Mo, Zizhao, et al.
Pubblicazione: (2026)

HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
di: Chen, Jiabin, et al.
Pubblicazione: (2024)

SLO-Aware Scheduling for Large Language Model Inferences
di: Huang, Jinqi, et al.
Pubblicazione: (2025)

EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration
di: Du, Jiangsu, et al.
Pubblicazione: (2025)

HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences
di: Gu, Jianfeng, et al.
Pubblicazione: (2025)

Memory Offloading for Large Language Model Inference with Latency SLO Guarantees
di: Ma, Chenxiang, et al.
Pubblicazione: (2025)

SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines
di: Cheng, Ke, et al.
Pubblicazione: (2024)

JITServe: SLO-aware LLM Serving with Imprecise Request Information
di: Zhang, Wei, et al.
Pubblicazione: (2025)

Hummingbird: SLO-Oriented GPU Preemption at Microsecond-scale
di: Hu, Tiancheng, et al.
Pubblicazione: (2026)

PromptTuner: SLO-Aware Elastic System for LLM Prompt Tuning
di: Gao, Wei, et al.
Pubblicazione: (2026)

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
di: Liang, Antian, et al.
Pubblicazione: (2025)

Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving
di: Nie, Chengyi, et al.
Pubblicazione: (2024)

SLO-Aware Task Offloading within Collaborative Vehicle Platoons
di: Sedlak, Boris, et al.
Pubblicazione: (2024)

An Interference-aware Approach for Co-located Container Orchestration with Novel Metric
di: Li, Xiang, et al.
Pubblicazione: (2024)

SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices
di: Chow, Will
Pubblicazione: (2025)

Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
di: Wang, Qipeng
Pubblicazione: (2026)

EconoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving
di: Shen, Haiying, et al.
Pubblicazione: (2024)

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
di: Duan, Jiaang, et al.
Pubblicazione: (2025)

PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving
di: Sun, Desen, et al.
Pubblicazione: (2025)

HeRo: Adaptive Orchestration of Agentic RAG on Heterogeneous Mobile SoC
di: Li, Maoliang, et al.
Pubblicazione: (2026)

Resource Slicing through Intelligent Orchestration of Energy-aware IoT services in Edge-Cloud Continuum
di: Shahid, Hafiz Faheem, et al.
Pubblicazione: (2024)

LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient Descent
di: Hu, Kan, et al.
Pubblicazione: (2024)

AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System
di: Bai, Fengyao, et al.
Pubblicazione: (2026)

BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs
di: Hu, Jianmin, et al.
Pubblicazione: (2025)

MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices
di: Hu, Kan, et al.
Pubblicazione: (2024)

pBeeGees: A Prudent Approach to Certificate-Decoupled BFT Consensus
di: Yang, Kaiji, et al.
Pubblicazione: (2025)

A House United Within Itself: SLO-Awareness for On-Premises Containerized ML Inference Clusters via Faro
di: Jeon, Beomyeol, et al.
Pubblicazione: (2024)

Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
di: Mo, Zizhao, et al.
Pubblicazione: (2024)

Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures
di: Kulagina, Svetlana, et al.
Pubblicazione: (2025)

QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration
di: Kumar, Satyam, et al.
Pubblicazione: (2026)

PolyServe: Efficient Multi-SLO Serving at Scale
di: Zhu, Kan, et al.
Pubblicazione: (2025)

An SLO Driven and Cost-Aware Autoscaling Framework for Kubernetes
di: Punniyamoorthy, Vinoth, et al.
Pubblicazione: (2025)

SLOs-Serve: Optimized Serving of Multi-SLO LLMs
di: Chen, Siyuan, et al.
Pubblicazione: (2025)

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning
di: Saba, Issa, et al.
Pubblicazione: (2024)

Mitigating Artifacts in Pre-quantization Based Scientific Data Compressors with Quantization-aware Interpolation
di: Jiao, Pu, et al.
Pubblicazione: (2026)

Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
di: Luo, Shuqing, et al.
Pubblicazione: (2024)

CASA: A Framework for SLO and Carbon-Aware Autoscaling and Scheduling in Serverless Cloud Computing
di: Qi, S., et al.
Pubblicazione: (2024)

A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving
di: Zhang, Yue, et al.
Pubblicazione: (2025)

GOGH: Correlation-Guided Orchestration of GPUs in Heterogeneous Clusters
di: Raeisi, Ahmad, et al.
Pubblicazione: (2025)