:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Jia, Haojie, Li, Zhenhao, Li, Gen, Xu, Minxian, Ye, Kejiang
Format:	Preprint
Publié:	2025
Sujets:	Distributed, Parallel, and Cluster Computing
Accès en ligne:	https://arxiv.org/abs/2505.23258
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

Collaborative Resource Management and Workloads Scheduling in Cloud-Assisted Mobile Edge Computing across Timescales
par: Tang, Lujie, et autres
Publié: (2024)

TempoScale: A Cloud Workloads Prediction Approach Integrating Short-Term and Long-Term Information
par: Wen, Linfeng, et autres
Publié: (2024)

BrownoutServe: SLO-Aware Inference Serving under Bursty Workloads for MoE-based LLMs
par: Hu, Jianmin, et autres
Publié: (2025)

MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices
par: Hu, Kan, et autres
Publié: (2024)

An Interference-aware Approach for Co-located Container Orchestration with Novel Metric
par: Li, Xiang, et autres
Publié: (2024)

DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based Clusters
par: Bai, Haoyu, et autres
Publié: (2024)

LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient Descent
par: Hu, Kan, et autres
Publié: (2024)

TD3-Sched: Learning to Orchestrate Container-based Cloud-Edge Resources via Distributed Reinforcement Learning
par: Song, Shengye, et autres
Publié: (2025)

BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving
par: Zheng, Wanyi, et autres
Publié: (2025)

UELLM: A Unified and Efficient Approach for LLM Inference Serving
par: He, Yiyuan, et autres
Publié: (2024)

Cloud Native System for LLM Inference Serving
par: Xu, Minxian, et autres
Publié: (2025)

Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
par: Wu, Jingfeng, et autres
Publié: (2025)

CloudNativeSim: a toolkit for modeling and simulation of cloud-native applications
par: Wu, Jingfeng, et autres
Publié: (2024)

Auto-scaling Approaches for Microservice Applications: A Survey and Taxonomy
par: Xu, Minxian, et autres
Publié: (2025)

DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving
par: Liao, Junhan, et autres
Publié: (2025)

C-Koordinator: Interference-aware Management for Large-scale and Co-located Microservice Clusters
par: Song, Shengye, et autres
Publié: (2025)

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
par: He, Yiyuan, et autres
Publié: (2025)

StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications
par: Wen, Linfeng, et autres
Publié: (2024)

Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters
par: Iserte, Sergio, et autres
Publié: (2025)

ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments
par: Medeiros, Daniel, et autres
Publié: (2025)

Crossword: Adaptive Consensus for Dynamic Data-Heavy Workloads
par: Hu, Guanzhou, et autres
Publié: (2025)

Workflow-Driven Modeling for the Compute Continuum: An Optimization Approach to Automated System and Workload Scheduling
par: Sharma, Aasish Kumar, et autres
Publié: (2025)

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
par: Lin, Yanying, et autres
Publié: (2025)

ORACL: Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices
par: Bai, Haoyu, et autres
Publié: (2026)

Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda
par: Xu, Minxian, et autres
Publié: (2026)

Dynamic Client Clustering, Bandwidth Allocation, and Workload Optimization for Semi-synchronous Federated Learning
par: Yu, Liangkun, et autres
Publié: (2024)

PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
par: Wu, Xin, et autres
Publié: (2026)

GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads
par: Ye, Fanjiang, et autres
Publié: (2026)

A Review of Tools and Techniques for Optimization of Workload Mapping and Scheduling in Heterogeneous HPC System
par: Sharma, Aasish Kumar, et autres
Publié: (2025)

Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks
par: Larsson, Oliver, et autres
Publié: (2026)

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems
par: Wang, Yuxin, et autres
Publié: (2024)

Edge AI: A Taxonomy, Systematic Review and Future Directions
par: Gill, Sukhpal Singh, et autres
Publié: (2024)

TierBase: A Workload-Driven Cost-Optimized Key-Value Store
par: Shen, Zhitao, et autres
Publié: (2025)

Data Management System Analysis for Distributed Computing Workloads
par: Hsu, Kuan-Chieh, et autres
Publié: (2025)

Orchestrating Mixed-Criticality Cloud Workloads in Reconfigurable Manufacturing Systems
par: Barletta, Marco, et autres
Publié: (2024)

DynaShard: Secure and Adaptive Blockchain Sharding Protocol with Hybrid Consensus and Dynamic Shard Management
par: Liu, Ao, et autres
Publié: (2024)

Multi-Layer Scheduling for MoE-Based LLM Reasoning
par: Sun, Yifan, et autres
Publié: (2026)

HiveMind: OS-Inspired Scheduling for Concurrent LLM Agent Workloads
par: Agyemang, Justice Owusu, et autres
Publié: (2026)

Profiling and Modeling of Power Characteristics of Leadership-Scale HPC System Workloads
par: Karimi, Ahmad Maroof, et autres
Publié: (2024)

Characterizing Production GPU Workloads using System-wide Telemetry Data
par: Cankur, Onur, et autres
Publié: (2025)