Saved in:
| Main Authors: | Yang, Jingchun, Zhang, Jinchang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.17930 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos
by: Yokoi, Shingo, et al.
Published: (2025)
by: Yokoi, Shingo, et al.
Published: (2025)
TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos
by: Charoenpitaks, Korawat, et al.
Published: (2025)
by: Charoenpitaks, Korawat, et al.
Published: (2025)
DashCop: Automated E-ticket Generation for Two-Wheeler Traffic Violations Using Dashcam Videos
by: Rawat, Deepti, et al.
Published: (2025)
by: Rawat, Deepti, et al.
Published: (2025)
Estimation of Kinematic Motion from Dashcam Footage
by: Zhang, Evelyn, et al.
Published: (2025)
by: Zhang, Evelyn, et al.
Published: (2025)
Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge
by: Duong, Anh-Kiet, et al.
Published: (2025)
by: Duong, Anh-Kiet, et al.
Published: (2025)
Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial Reasoning
by: Yang, Shuonan, et al.
Published: (2026)
by: Yang, Shuonan, et al.
Published: (2026)
Nexar Dashcam Collision Prediction Dataset and Challenge
by: Moura, Daniel C., et al.
Published: (2025)
by: Moura, Daniel C., et al.
Published: (2025)
Automated Genomic Interpretation via Concept Bottleneck Models for Medical Robotics
by: Li, Zijun, et al.
Published: (2025)
by: Li, Zijun, et al.
Published: (2025)
Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference
by: Yang, Yunxiang, et al.
Published: (2025)
by: Yang, Yunxiang, et al.
Published: (2025)
Object Detection for Vehicle Dashcams using Transformers
by: Mustafa, Osama, et al.
Published: (2024)
by: Mustafa, Osama, et al.
Published: (2024)
SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding
by: Zhou, Xingcheng, et al.
Published: (2026)
by: Zhou, Xingcheng, et al.
Published: (2026)
From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events
by: Miao, Yan, et al.
Published: (2024)
by: Miao, Yan, et al.
Published: (2024)
Underground Mapping and Localization Based on Ground-Penetrating Radar
by: Zhang, Jinchang, et al.
Published: (2024)
by: Zhang, Jinchang, et al.
Published: (2024)
Vision-Language Embodiment for Monocular Depth Estimation
by: Zhang, Jinchang, et al.
Published: (2025)
by: Zhang, Jinchang, et al.
Published: (2025)
InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
by: Vishal, Joseph Raj, et al.
Published: (2025)
by: Vishal, Joseph Raj, et al.
Published: (2025)
BADAS: Context Aware Collision Prediction Using Real-World Dashcam Data
by: Goldshmidt, Roni, et al.
Published: (2025)
by: Goldshmidt, Roni, et al.
Published: (2025)
Fingerprinting New York City's Scaffolding Problem with Longitudinal Dashcam Data
by: Shapira, Dorin, et al.
Published: (2024)
by: Shapira, Dorin, et al.
Published: (2024)
RoLID-11K: A Dashcam Dataset for Small-Object Roadside Litter Detection
by: Wu, Tao, et al.
Published: (2026)
by: Wu, Tao, et al.
Published: (2026)
Adaptive Event Stream Slicing for Open-Vocabulary Event-Based Object Detection via Vision-Language Knowledge Distillation
by: Zhang, Jinchang, et al.
Published: (2025)
by: Zhang, Jinchang, et al.
Published: (2025)
Fractal Autoregressive Depth Estimation with Continuous Token Diffusion
by: Zhang, Jinchang, et al.
Published: (2026)
by: Zhang, Jinchang, et al.
Published: (2026)
Keypoint Detection and Description for Raw Bayer Images
by: Lin, Jiakai, et al.
Published: (2025)
by: Lin, Jiakai, et al.
Published: (2025)
Graph Integrated Multimodal Concept Bottleneck Model
by: Lin, Jiakai, et al.
Published: (2025)
by: Lin, Jiakai, et al.
Published: (2025)
Language-Depth Navigated Thermal and Visible Image Fusion
by: Zhang, Jinchang, et al.
Published: (2025)
by: Zhang, Jinchang, et al.
Published: (2025)
Language Model Guided Interpretable Video Action Reasoning
by: Wang, Ning, et al.
Published: (2024)
by: Wang, Ning, et al.
Published: (2024)
UDVideoQA: A Traffic Video Question Answering Dataset for Multi-Object Spatio-Temporal Reasoning in Urban Dynamics
by: Vishal, Joseph Raj, et al.
Published: (2026)
by: Vishal, Joseph Raj, et al.
Published: (2026)
Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement
by: Zhang, Dehuan, et al.
Published: (2023)
by: Zhang, Dehuan, et al.
Published: (2023)
TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs
by: Arefeen, Md Adnan, et al.
Published: (2025)
by: Arefeen, Md Adnan, et al.
Published: (2025)
IA2U: A Transfer Plugin with Multi-Prior for In-Air Model to Underwater
by: Zhou, Jingchun, et al.
Published: (2023)
by: Zhou, Jingchun, et al.
Published: (2023)
PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions
by: Jing, Taotao, et al.
Published: (2021)
by: Jing, Taotao, et al.
Published: (2021)
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
by: Shi, Yudi, et al.
Published: (2024)
by: Shi, Yudi, et al.
Published: (2024)
Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus
by: Zhang, Jinchang, et al.
Published: (2024)
by: Zhang, Jinchang, et al.
Published: (2024)
SignEye: Traffic Sign Interpretation from Vehicle First-Person View
by: Yang, Chuang, et al.
Published: (2024)
by: Yang, Chuang, et al.
Published: (2024)
UTA-Sign: Unsupervised Thermal Video Augmentation via Event-Assisted Traffic Signage Sketching
by: Han, Yuqi, et al.
Published: (2025)
by: Han, Yuqi, et al.
Published: (2025)
WaterHE-NeRF: Water-ray Tracing Neural Radiance Fields for Underwater Scene Reconstruction
by: Zhou, Jingchun, et al.
Published: (2023)
by: Zhou, Jingchun, et al.
Published: (2023)
VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents
by: Eskandar, George, et al.
Published: (2026)
by: Eskandar, George, et al.
Published: (2026)
AgentCVR: Active Multi-Agent Cross-Video Reasoning via Script-Simulated Reinforcement Learning
by: Qiu, Yilun, et al.
Published: (2026)
by: Qiu, Yilun, et al.
Published: (2026)
Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation
by: Jiang, Haichao, et al.
Published: (2026)
by: Jiang, Haichao, et al.
Published: (2026)
Scaling Video Understanding via Compact Latent Multi-Agent Collaboration
by: Chen, Kerui, et al.
Published: (2026)
by: Chen, Kerui, et al.
Published: (2026)
AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
by: Yan, Hailong, et al.
Published: (2026)
by: Yan, Hailong, et al.
Published: (2026)
Similar Items
-
Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos
by: Yokoi, Shingo, et al.
Published: (2025) -
TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos
by: Charoenpitaks, Korawat, et al.
Published: (2025) -
DashCop: Automated E-ticket Generation for Two-Wheeler Traffic Violations Using Dashcam Videos
by: Rawat, Deepti, et al.
Published: (2025) -
Estimation of Kinematic Motion from Dashcam Footage
by: Zhang, Evelyn, et al.
Published: (2025) -
Addressing Out-of-Label Hazard Detection in Dashcam Videos: Insights from the COOOL Challenge
by: Duong, Anh-Kiet, et al.
Published: (2025)