Saved in:
Bibliographic Details
Main Authors: Ding, Yongliang, Bi, Qigong, Pu, Peng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.26422
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917447076937728
author Ding, Yongliang
Bi, Qigong
Pu, Peng
author_facet Ding, Yongliang
Bi, Qigong
Pu, Peng
contents Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.
format Preprint
id arxiv_https___arxiv_org_abs_2604_26422
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
Ding, Yongliang
Bi, Qigong
Pu, Peng
Machine Learning
Artificial Intelligence
Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.
title STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2604.26422