Saved in:
Bibliographic Details
Main Authors: Wagner, Royden, Tas, Omer Sahin, Villa, Jaime, Hauser, Felix, Shen, Yinzhe, Steiner, Marlon, Strutz, Dominik, Fernandez, Carlos, Kinzig, Christian, Guitierrez-Cabello, Guillermo S., Königshof, Hendrik, Immel, Fabian, Schwarzkopf, Richard, Rack, Nils Alexander, Rösch, Kevin, Wang, Kaiwen, Pauls, Jan-Hendrik, Lauer, Martin, Gilitschenski, Igor, Caesar, Holger, Stiller, Christoph
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.23607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918430157832192
author Wagner, Royden
Tas, Omer Sahin
Villa, Jaime
Hauser, Felix
Shen, Yinzhe
Steiner, Marlon
Strutz, Dominik
Fernandez, Carlos
Kinzig, Christian
Guitierrez-Cabello, Guillermo S.
Königshof, Hendrik
Immel, Fabian
Schwarzkopf, Richard
Rack, Nils Alexander
Rösch, Kevin
Wang, Kaiwen
Pauls, Jan-Hendrik
Lauer, Martin
Gilitschenski, Igor
Caesar, Holger
Stiller, Christoph
author_facet Wagner, Royden
Tas, Omer Sahin
Villa, Jaime
Hauser, Felix
Shen, Yinzhe
Steiner, Marlon
Strutz, Dominik
Fernandez, Carlos
Kinzig, Christian
Guitierrez-Cabello, Guillermo S.
Königshof, Hendrik
Immel, Fabian
Schwarzkopf, Richard
Rack, Nils Alexander
Rösch, Kevin
Wang, Kaiwen
Pauls, Jan-Hendrik
Lauer, Martin
Gilitschenski, Igor
Caesar, Holger
Stiller, Christoph
contents In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at: https://hf.co/datasets/kit-mrt/kitscenes-longtail
format Preprint
id arxiv_https___arxiv_org_abs_2603_23607
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
Wagner, Royden
Tas, Omer Sahin
Villa, Jaime
Hauser, Felix
Shen, Yinzhe
Steiner, Marlon
Strutz, Dominik
Fernandez, Carlos
Kinzig, Christian
Guitierrez-Cabello, Guillermo S.
Königshof, Hendrik
Immel, Fabian
Schwarzkopf, Richard
Rack, Nils Alexander
Rösch, Kevin
Wang, Kaiwen
Pauls, Jan-Hendrik
Lauer, Martin
Gilitschenski, Igor
Caesar, Holger
Stiller, Christoph
Computer Vision and Pattern Recognition
Robotics
In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at: https://hf.co/datasets/kit-mrt/kitscenes-longtail
title LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
topic Computer Vision and Pattern Recognition
Robotics
url https://arxiv.org/abs/2603.23607