:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Wu, Jing, Wang, Lin, Deng, Quanfeng, Yu, Chen, Zhang, Dong, Yan, Bingheng, Liu, Fangming
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Distributed, Parallel, and Cluster Computing
Accesso online:	https://arxiv.org/abs/2502.14320
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

DeepServe: Serverless Large Language Model Serving at Scale
di: Hu, Junhao, et al.
Pubblicazione: (2025)

Dilu: Enabling GPU Resourcing-on-Demand for Serverless DL Serving via Introspective Elasticity
di: Lv, Cunchi, et al.
Pubblicazione: (2025)

HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
di: Chen, Jiabin, et al.
Pubblicazione: (2024)

HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds
di: Lou, Chiheng, et al.
Pubblicazione: (2025)

Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic Serving
di: Pagonas, Nikos, et al.
Pubblicazione: (2025)

Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
di: Yu, Minchen, et al.
Pubblicazione: (2023)

AARC: Automated Affinity-aware Resource Configuration for Serverless Workflows
di: Jin, Lingxiao, et al.
Pubblicazione: (2025)

Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems
di: Li, Rui, et al.
Pubblicazione: (2025)

GeoFF: Federated Serverless Workflows with Data Pre-Fetching
di: Carl, Natalie, et al.
Pubblicazione: (2024)

sAirflow: Adopting Serverless in a Legacy Workflow Scheduler
di: Mikina, Filip, et al.
Pubblicazione: (2024)

Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability
di: Liu, Qingyuan, et al.
Pubblicazione: (2024)

Leveraging Core and Uncore Frequency Scaling for Power-Efficient Serverless Workflows
di: Tzenetopoulos, Achilleas, et al.
Pubblicazione: (2024)

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
di: Lin, Yanying, et al.
Pubblicazione: (2025)

Towards Resource-Efficient Serverless LLM Inference with SLINFER
di: Xu, Chuhao, et al.
Pubblicazione: (2025)

FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
di: Wu, Hao, et al.
Pubblicazione: (2024)

Cosmos: A Cost Model for Serverless Workflows in the 3D Compute Continuum
di: Marcelino, Cynthia, et al.
Pubblicazione: (2025)

ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters
di: Farahani, Reza, et al.
Pubblicazione: (2026)

Software Resource Disaggregation for HPC with Serverless Computing
di: Copik, Marcin, et al.
Pubblicazione: (2024)

HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling
di: Peng, You, et al.
Pubblicazione: (2026)

OTAS: An Elastic Transformer Serving System via Token Adaptation
di: Chen, Jinyu, et al.
Pubblicazione: (2024)

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs
di: Hui, Xinning, et al.
Pubblicazione: (2024)

Databelt: A Continuous Data Path for Serverless Workflows in the 3D Compute Continuum
di: Marcelino, Cynthia, et al.
Pubblicazione: (2025)

Truffle: Efficient Data Passing for Data-Intensive Serverless Workflows in the Edge-Cloud Continuum
di: Marcelino, Cynthia, et al.
Pubblicazione: (2024)

Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows
di: Dai, Yinwei, et al.
Pubblicazione: (2025)

GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources
di: Du, Boxiao, et al.
Pubblicazione: (2026)

Dependency-aware Resource Allocation for Serverless Functions at the Edge
di: Baresi, Luciano, et al.
Pubblicazione: (2023)

Serverless Approach to Running Resource-Intensive STAR Aligner
di: Kica, Piotr, et al.
Pubblicazione: (2025)

Serverless Everywhere: A Comparative Analysis of WebAssembly Workflows Across Browser, Edge, and Cloud
di: Colosi, Mario, et al.
Pubblicazione: (2025)

Boosting LLM Serving through Spatial-Temporal GPU Resource Sharing
di: Lin, Zejia, et al.
Pubblicazione: (2025)

ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving
di: Huang, Tao, et al.
Pubblicazione: (2024)

Making Serverless Computing Extensible: A Case Study of Serverless Data Analytics
di: Yu, Minchen, et al.
Pubblicazione: (2025)

ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity
di: Chen, Long, et al.
Pubblicazione: (2025)

AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU
di: Zhang, Yuning, et al.
Pubblicazione: (2026)

SeBS-Flow: Benchmarking Serverless Cloud Function Workflows
di: Schmid, Larissa, et al.
Pubblicazione: (2024)

Adaptive Resource Allocation for Workflow Containerization on Kubernetes
di: Shan, Chenggang, et al.
Pubblicazione: (2023)

Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management
di: Wu, Z., et al.
Pubblicazione: (2025)

Zenix: Efficient Execution of Bulky Serverless Applications
di: Guo, Zhiyuan, et al.
Pubblicazione: (2022)

MoEless: Efficient MoE LLM Serving via Serverless Computing
di: Yu, Hanfei, et al.
Pubblicazione: (2026)

A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving
di: Zhang, Yue, et al.
Pubblicazione: (2025)

EconoServe: Maximizing Multi-Resource Utilization with SLO Guarantees in LLM Serving
di: Shen, Haiying, et al.
Pubblicazione: (2024)