Saved in:
Bibliographic Details
Main Authors: Lin, Yuqiang, Chen, Kehua, Lockyer, Sam, Yadav, Arjun, Sui, Mingxuan, Zhang, Shucheng, Shi, Yan, Wang, Bingzhang, Zhang, Yuang, Zarbock, Markus, Stanek, Florain, Evans, Adrian, Li, Wenbin, Wang, Yinhai, Zhang, Nic
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.19098
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Traffic Anomaly Understanding (TAU) is important for traffic safety in Intelligent Transportation Systems. Recent vision-language models (VLMs) have shown strong capabilities in video understanding. However, progress on TAU remains limited due to the lack of benchmarks and task-specific methodologies. To address this limitation, we introduce Roundabout-TAU, a dataset constructed from real-world roundabout videos collected in collaboration with the City of Carmel, Indiana. The dataset contains 342 clips and is annotated with more than 2,000 question-answer pairs covering multiple aspects of traffic anomaly understanding. Building on this benchmark, we propose TAU-R1, a two-layer vision-language framework for TAU. The first layer is a lightweight anomaly classifier that performs coarse anomaly categorisation, while the second layer is a larger anomaly reasoner that generates detailed event summaries. To improve task-specific reasoning, we introduce a two-stage training strategy consisting of decomposed-QA-enhanced supervised fine-tuning followed by TAU-GRPO, a GRPO-based post-training method with TAU-specific reward functions. Experimental results show that TAU-R1 achieves strong performance on both anomaly classification and reasoning tasks while maintaining deployment efficiency. The dataset and code are available at: https://github.com/siri-rouser/TAU-R1