Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ding, Yongliang, Bi, Qigong, Pu, Peng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.26422
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917447076937728
author	Ding, Yongliang Bi, Qigong Pu, Peng
author_facet	Ding, Yongliang Bi, Qigong Pu, Peng
contents	Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_26422
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices Ding, Yongliang Bi, Qigong Pu, Peng Machine Learning Artificial Intelligence Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.
title	STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2604.26422

Similar Items