:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Xingcheng, Larintzakis, Konstantinos, Guo, Hao, Zimmer, Walter, Liu, Mingyu, Cao, Hu, Zhang, Jiajie, Lakshminarasimhan, Venkatnarayanan, Strand, Leah, Knoll, Alois C.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.02449
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
by: Zhou, Xingcheng, et al.
Published: (2026)

WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection
by: Zhou, Xingcheng, et al.
Published: (2024)

SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding
by: Zhou, Xingcheng, et al.
Published: (2026)

TUMTraf Event: Calibration and Fusion Resulting in a Dataset for Roadside Event-Based and RGB Cameras
by: Creß, Christian, et al.
Published: (2024)

PointCompress3D: A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems
by: Zimmer, Walter, et al.
Published: (2024)

GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events
by: Zhou, Xingcheng, et al.
Published: (2024)

Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
by: Liang, Lili, et al.
Published: (2024)

Understanding Complexity in VideoQA via Visual Program Generation
by: Eyzaguirre, Cristobal, et al.
Published: (2025)

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
by: Song, Zijie, et al.
Published: (2025)

VideoQA in the Era of LLMs: An Empirical Study
by: Xiao, Junbin, et al.
Published: (2024)

Vision Language Models in Autonomous Driving: A Survey and Outlook
by: Zhou, Xingcheng, et al.
Published: (2023)

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
by: Parikh, Chirag, et al.
Published: (2025)

ENTER: Event Based Interpretable Reasoning for VideoQA
by: Ayyubi, Hammad, et al.
Published: (2025)

Reading Between the Lanes: Text VideoQA on the Road
by: Tom, George, et al.
Published: (2023)

ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
by: Salehi, Mohammadreza, et al.
Published: (2024)

VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
by: Guo, Jiangyuan, et al.
Published: (2024)

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
by: Chen, Qirui, et al.
Published: (2024)

TUMTraf EMOT: Event-Based Multi-Object Tracking Dataset and Baseline for Traffic Scenarios
by: Li, Mengyu, et al.
Published: (2025)

ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
by: Liang, Jianxin, et al.
Published: (2025)

SegRGB-X: General RGB-X Semantic Segmentation Model
by: Liu, Jiong, et al.
Published: (2026)

Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024)

Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
by: Liang, Jianxin, et al.
Published: (2025)

Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
by: Rawal, Ishaan Singh, et al.
Published: (2023)

QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
by: He, Zhixian, et al.
Published: (2024)

StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
by: Hu, Yuhang, et al.
Published: (2025)

A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook
by: Liu, Mingyu, et al.
Published: (2024)

Safety-Critical Learning for Long-Tail Events: The TUM Traffic Accident Dataset
by: Zimmer, Walter, et al.
Published: (2025)

TUMTraf V2X Cooperative Perception Dataset
by: Zimmer, Walter, et al.
Published: (2024)

GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs
by: Liu, Mingyu, et al.
Published: (2024)

Towards Vision Zero: The TUM Traffic Accid3nD Dataset
by: Zimmer, Walter, et al.
Published: (2025)

UDVideoQA: A Traffic Video Question Answering Dataset for Multi-Object Spatio-Temporal Reasoning in Urban Dynamics
by: Vishal, Joseph Raj, et al.
Published: (2026)

Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
by: Ma, Jingtian, et al.
Published: (2025)

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
by: Rasheed, Hanoona, et al.
Published: (2025)

TimeLogic: A Temporal Logic Benchmark for Video QA
by: Swetha, Sirnam, et al.
Published: (2025)

Enhancing Highway Safety: Accident Detection on the A9 Test Stretch Using Roadside Sensors
by: Zimmer, Walter, et al.
Published: (2025)

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark
by: Aparcedo, Alejandro, et al.
Published: (2026)

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
by: Zhang, Jianrui, et al.
Published: (2026)

Enhancing Scene Transition Awareness in Video Generation via Post-Training
by: Shen, Hanwen, et al.
Published: (2025)

InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
by: Vishal, Joseph Raj, et al.
Published: (2025)