Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.26422 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917447076937728 |
|---|---|
| author | Ding, Yongliang Bi, Qigong Pu, Peng |
| author_facet | Ding, Yongliang Bi, Qigong Pu, Peng |
| contents | Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2604_26422 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices Ding, Yongliang Bi, Qigong Pu, Peng Machine Learning Artificial Intelligence Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic. |
| title | STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2604.26422 |