Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ba, Amadou, Harsha, Pavithra, Subramanian, Chitra
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2409.03103
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908042167058432
author	Ba, Amadou Harsha, Pavithra Subramanian, Chitra
author_facet	Ba, Amadou Harsha, Pavithra Subramanian, Chitra
contents	Modern web services adopt cloud-native principles to leverage the advantages of microservices. To consistently guarantee high Quality of Service (QoS) according to Service Level Agreements (SLAs), ensure satisfactory user experiences, and minimize operational costs, each microservice must be provisioned with the right amount of resources. However, accurately provisioning microservices with adequate resources is complex and depends on many factors, including workload intensity and the complex interconnections between microservices. To address this challenge, we develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization. We then use the developed model to predict the end-to-end latency. Our solution leverages the Temporal Fusion Transformer (TFT), an attention-based architecture equipped with interpretability features. When the prediction results indicate SLA non-compliance, we use the feature importance provided by the TFT as covariates in Kernel Ridge Regression (KRR), with the response variable being the desired latency, to learn the parameters associated with the feature importance. These learned parameters reflect the adjustments required to the features to ensure SLA compliance. We demonstrate the merit of our approach with a microservice-based application and provide a roadmap to deployment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_03103
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources Ba, Amadou Harsha, Pavithra Subramanian, Chitra Machine Learning Modern web services adopt cloud-native principles to leverage the advantages of microservices. To consistently guarantee high Quality of Service (QoS) according to Service Level Agreements (SLAs), ensure satisfactory user experiences, and minimize operational costs, each microservice must be provisioned with the right amount of resources. However, accurately provisioning microservices with adequate resources is complex and depends on many factors, including workload intensity and the complex interconnections between microservices. To address this challenge, we develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization. We then use the developed model to predict the end-to-end latency. Our solution leverages the Temporal Fusion Transformer (TFT), an attention-based architecture equipped with interpretability features. When the prediction results indicate SLA non-compliance, we use the feature importance provided by the TFT as covariates in Kernel Ridge Regression (KRR), with the response variable being the desired latency, to learn the parameters associated with the feature importance. These learned parameters reflect the adjustments required to the features to ensure SLA compliance. We demonstrate the merit of our approach with a microservice-based application and provide a roadmap to deployment.
title	Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources
topic	Machine Learning
url	https://arxiv.org/abs/2409.03103

Similar Items