Saved in:
Bibliographic Details
Main Authors: Zhao, Yongxin, Zhang, Shenglin, Wu, Yujia, Sun, Yuxin, Sun, Yongqian, Pei, Dan, Bansal, Chetan, Ma, Minghua
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.08607
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • As modern software systems continue to grow in complexity, triage has become a fundamental process in system operations and maintenance. Triage aims to efficiently prioritize, assign, and assess issues to ensure the reliability of complex environments. The vast amount of heterogeneous data generated by software systems has made effective triage indispensable for maintaining reliability, facilitating maintainability, and enabling rapid issue response. Motivated by these challenges, researchers have devoted extensive effort to advancing triage automation and have achieved significant progress over the past two decades. This survey provides a comprehensive review of 234 papers from 2004 to the present, offering an in-depth examination of the fundamental concepts, system architecture, and problem statement. By comparing the distinct goals of academic and industrial research and by analyzing empirical studies of industrial practices, we identify the major obstacles that limit the practical deployment of triage systems. To assist practitioners in method selection and performance evaluation, we summarize widely adopted open-source datasets and evaluation metrics, providing a unified perspective on the measurement of triage effectiveness. Finally, we outline potential future directions and emerging opportunities to foster a closer integration between academic innovation and industrial application. All reviewed papers and projects are available at https://github.com/AIOps-Lab-NKU/TriageSurvey.