Saved in:
| Main Authors: | Zhou, Xingcheng, Larintzakis, Konstantinos, Guo, Hao, Zimmer, Walter, Liu, Mingyu, Cao, Hu, Zhang, Jiajie, Lakshminarasimhan, Venkatnarayanan, Strand, Leah, Knoll, Alois C. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02449 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
by: Zhou, Xingcheng, et al.
Published: (2026)
by: Zhou, Xingcheng, et al.
Published: (2026)
WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection
by: Zhou, Xingcheng, et al.
Published: (2024)
by: Zhou, Xingcheng, et al.
Published: (2024)
SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding
by: Zhou, Xingcheng, et al.
Published: (2026)
by: Zhou, Xingcheng, et al.
Published: (2026)
TUMTraf Event: Calibration and Fusion Resulting in a Dataset for Roadside Event-Based and RGB Cameras
by: Creß, Christian, et al.
Published: (2024)
by: Creß, Christian, et al.
Published: (2024)
PointCompress3D: A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems
by: Zimmer, Walter, et al.
Published: (2024)
by: Zimmer, Walter, et al.
Published: (2024)
GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events
by: Zhou, Xingcheng, et al.
Published: (2024)
by: Zhou, Xingcheng, et al.
Published: (2024)
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
by: Liang, Lili, et al.
Published: (2024)
by: Liang, Lili, et al.
Published: (2024)
Understanding Complexity in VideoQA via Visual Program Generation
by: Eyzaguirre, Cristobal, et al.
Published: (2025)
by: Eyzaguirre, Cristobal, et al.
Published: (2025)
Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
by: Song, Zijie, et al.
Published: (2025)
by: Song, Zijie, et al.
Published: (2025)
VideoQA in the Era of LLMs: An Empirical Study
by: Xiao, Junbin, et al.
Published: (2024)
by: Xiao, Junbin, et al.
Published: (2024)
Vision Language Models in Autonomous Driving: A Survey and Outlook
by: Zhou, Xingcheng, et al.
Published: (2023)
by: Zhou, Xingcheng, et al.
Published: (2023)
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
by: Parikh, Chirag, et al.
Published: (2025)
by: Parikh, Chirag, et al.
Published: (2025)
ENTER: Event Based Interpretable Reasoning for VideoQA
by: Ayyubi, Hammad, et al.
Published: (2025)
by: Ayyubi, Hammad, et al.
Published: (2025)
Reading Between the Lanes: Text VideoQA on the Road
by: Tom, George, et al.
Published: (2023)
by: Tom, George, et al.
Published: (2023)
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
by: Salehi, Mohammadreza, et al.
Published: (2024)
by: Salehi, Mohammadreza, et al.
Published: (2024)
VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
by: Guo, Jiangyuan, et al.
Published: (2024)
by: Guo, Jiangyuan, et al.
Published: (2024)
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
by: Chen, Qirui, et al.
Published: (2024)
by: Chen, Qirui, et al.
Published: (2024)
TUMTraf EMOT: Event-Based Multi-Object Tracking Dataset and Baseline for Traffic Scenarios
by: Li, Mengyu, et al.
Published: (2025)
by: Li, Mengyu, et al.
Published: (2025)
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
by: Liang, Jianxin, et al.
Published: (2025)
by: Liang, Jianxin, et al.
Published: (2025)
SegRGB-X: General RGB-X Semantic Segmentation Model
by: Liu, Jiong, et al.
Published: (2026)
by: Liu, Jiong, et al.
Published: (2026)
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024)
by: Heyward, Joseph, et al.
Published: (2024)
Beyond Isolated Facts: Synthesizing Narrative and Grounded Supervision for VideoQA
by: Liang, Jianxin, et al.
Published: (2025)
by: Liang, Jianxin, et al.
Published: (2025)
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
by: Rawal, Ishaan Singh, et al.
Published: (2023)
by: Rawal, Ishaan Singh, et al.
Published: (2023)
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
by: He, Zhixian, et al.
Published: (2024)
by: He, Zhixian, et al.
Published: (2024)
StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA
by: Hu, Yuhang, et al.
Published: (2025)
by: Hu, Yuhang, et al.
Published: (2025)
A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook
by: Liu, Mingyu, et al.
Published: (2024)
by: Liu, Mingyu, et al.
Published: (2024)
Safety-Critical Learning for Long-Tail Events: The TUM Traffic Accident Dataset
by: Zimmer, Walter, et al.
Published: (2025)
by: Zimmer, Walter, et al.
Published: (2025)
TUMTraf V2X Cooperative Perception Dataset
by: Zimmer, Walter, et al.
Published: (2024)
by: Zimmer, Walter, et al.
Published: (2024)
GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs
by: Liu, Mingyu, et al.
Published: (2024)
by: Liu, Mingyu, et al.
Published: (2024)
Towards Vision Zero: The TUM Traffic Accid3nD Dataset
by: Zimmer, Walter, et al.
Published: (2025)
by: Zimmer, Walter, et al.
Published: (2025)
UDVideoQA: A Traffic Video Question Answering Dataset for Multi-Object Spatio-Temporal Reasoning in Urban Dynamics
by: Vishal, Joseph Raj, et al.
Published: (2026)
by: Vishal, Joseph Raj, et al.
Published: (2026)
Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
by: Ma, Jingtian, et al.
Published: (2025)
by: Ma, Jingtian, et al.
Published: (2025)
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
by: Rasheed, Hanoona, et al.
Published: (2025)
by: Rasheed, Hanoona, et al.
Published: (2025)
TimeLogic: A Temporal Logic Benchmark for Video QA
by: Swetha, Sirnam, et al.
Published: (2025)
by: Swetha, Sirnam, et al.
Published: (2025)
Enhancing Highway Safety: Accident Detection on the A9 Test Stretch Using Roadside Sensors
by: Zimmer, Walter, et al.
Published: (2025)
by: Zimmer, Walter, et al.
Published: (2025)
VISTA: Video Interaction Spatio-Temporal Analysis Benchmark
by: Aparcedo, Alejandro, et al.
Published: (2026)
by: Aparcedo, Alejandro, et al.
Published: (2026)
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)
by: Cheng, Zixu, et al.
Published: (2025)
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
by: Zhang, Jianrui, et al.
Published: (2026)
by: Zhang, Jianrui, et al.
Published: (2026)
Enhancing Scene Transition Awareness in Video Generation via Post-Training
by: Shen, Hanwen, et al.
Published: (2025)
by: Shen, Hanwen, et al.
Published: (2025)
InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
by: Vishal, Joseph Raj, et al.
Published: (2025)
by: Vishal, Joseph Raj, et al.
Published: (2025)
Similar Items
-
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
by: Zhou, Xingcheng, et al.
Published: (2026) -
WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection
by: Zhou, Xingcheng, et al.
Published: (2024) -
SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding
by: Zhou, Xingcheng, et al.
Published: (2026) -
TUMTraf Event: Calibration and Fusion Resulting in a Dataset for Roadside Event-Based and RGB Cameras
by: Creß, Christian, et al.
Published: (2024) -
PointCompress3D: A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems
by: Zimmer, Walter, et al.
Published: (2024)