Saved in:
| Main Authors: | Zadeh, Danial Sadrian, Basir, Otman A., Moshiri, Behzad |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.14438 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus
by: Fatahi, Mohammad, et al.
Published: (2025)
by: Fatahi, Mohammad, et al.
Published: (2025)
Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding
by: Elhenawy, Mohammed, et al.
Published: (2025)
by: Elhenawy, Mohammed, et al.
Published: (2025)
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
by: De, Anik, et al.
Published: (2025)
by: De, Anik, et al.
Published: (2025)
Open World Scene Graph Generation using Vision Language Models
by: Dutta, Amartya, et al.
Published: (2025)
by: Dutta, Amartya, et al.
Published: (2025)
General Scene Adaptation for Vision-and-Language Navigation
by: Hong, Haodong, et al.
Published: (2025)
by: Hong, Haodong, et al.
Published: (2025)
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
by: Jia, Baoxiong, et al.
Published: (2024)
by: Jia, Baoxiong, et al.
Published: (2024)
DragTraffic: Interactive and Controllable Traffic Scene Generation for Autonomous Driving
by: Wang, Sheng, et al.
Published: (2024)
by: Wang, Sheng, et al.
Published: (2024)
doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation
by: Roy, Parthib, et al.
Published: (2024)
by: Roy, Parthib, et al.
Published: (2024)
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)
by: Wang, Zhaowei, et al.
Published: (2024)
Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving
by: Gao, Haoxiang, et al.
Published: (2025)
by: Gao, Haoxiang, et al.
Published: (2025)
Zero-Shot Scene Understanding with Multimodal Large Language Models for Automated Vehicles
by: Elhenawy, Mohammed, et al.
Published: (2025)
by: Elhenawy, Mohammed, et al.
Published: (2025)
Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes
by: Agarwal, Nakul, et al.
Published: (2026)
by: Agarwal, Nakul, et al.
Published: (2026)
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving
by: Zunair, Hasib, et al.
Published: (2024)
by: Zunair, Hasib, et al.
Published: (2024)
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
by: Lohner, Aaron, et al.
Published: (2024)
by: Lohner, Aaron, et al.
Published: (2024)
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
by: Ma, Jingtian, et al.
Published: (2025)
by: Ma, Jingtian, et al.
Published: (2025)
X-Driver: Explainable Autonomous Driving with Vision-Language Models
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
MGNet: Monocular Geometric Scene Understanding for Autonomous Driving
by: Schön, Markus, et al.
Published: (2022)
by: Schön, Markus, et al.
Published: (2022)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
by: Yang, Zongxin, et al.
Published: (2024)
by: Yang, Zongxin, et al.
Published: (2024)
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
by: Oliveira, Daniel A. P., et al.
Published: (2025)
by: Oliveira, Daniel A. P., et al.
Published: (2025)
OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving
by: Liu, Pei, et al.
Published: (2025)
by: Liu, Pei, et al.
Published: (2025)
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
by: Schäfer, Finn Rasmus, et al.
Published: (2026)
by: Schäfer, Finn Rasmus, et al.
Published: (2026)
Embodied Agents for Efficient Exploration and Smart Scene Description
by: Bigazzi, Roberto, et al.
Published: (2023)
by: Bigazzi, Roberto, et al.
Published: (2023)
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
by: Kamali, Danial, et al.
Published: (2025)
by: Kamali, Danial, et al.
Published: (2025)
DriveIndia: An Object Detection Dataset for Diverse Indian Traffic Scenes
by: Kumar, Rishav, et al.
Published: (2025)
by: Kumar, Rishav, et al.
Published: (2025)
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)
by: Tian, Kexin, et al.
Published: (2025)
PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network
by: Wang, Yuning, et al.
Published: (2024)
by: Wang, Yuning, et al.
Published: (2024)
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
by: Zheng, Lianqing, et al.
Published: (2024)
by: Zheng, Lianqing, et al.
Published: (2024)
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
by: Roemmele, Melissa, et al.
Published: (2025)
by: Roemmele, Melissa, et al.
Published: (2025)
Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
by: Lv, Changsheng, et al.
Published: (2024)
by: Lv, Changsheng, et al.
Published: (2024)
Pascal-Weighted Genetic Algorithms: A Binomially-Structured Recombination Framework
by: Basir, Otman A.
Published: (2025)
by: Basir, Otman A.
Published: (2025)
ScenePilot-4K: A Large-Scale First-Person Dataset and Benchmark for Vision-Language Models in Autonomous Driving
by: Wang, Yujin, et al.
Published: (2026)
by: Wang, Yujin, et al.
Published: (2026)
SIMSplat: Predictive Driving Scene Editing with Language-aligned 4D Gaussian Splatting
by: Park, Sung-Yeon, et al.
Published: (2025)
by: Park, Sung-Yeon, et al.
Published: (2025)
A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
by: Zaouali, Mahmoud Chick, et al.
Published: (2025)
by: Zaouali, Mahmoud Chick, et al.
Published: (2025)
SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes
by: Wang, Chuhan, et al.
Published: (2026)
by: Wang, Chuhan, et al.
Published: (2026)
Scenario Understanding of Traffic Scenes Through Large Visual Language Models
by: Rivera, Esteban, et al.
Published: (2025)
by: Rivera, Esteban, et al.
Published: (2025)
Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving
by: Zhou, Hongkuan, et al.
Published: (2025)
by: Zhou, Hongkuan, et al.
Published: (2025)
The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge
by: Peng, Jinghan, et al.
Published: (2025)
by: Peng, Jinghan, et al.
Published: (2025)
Similar Items
-
An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus
by: Fatahi, Mohammad, et al.
Published: (2025) -
Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding
by: Elhenawy, Mohammed, et al.
Published: (2025) -
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
by: De, Anik, et al.
Published: (2025) -
Open World Scene Graph Generation using Vision Language Models
by: Dutta, Amartya, et al.
Published: (2025) -
General Scene Adaptation for Vision-and-Language Navigation
by: Hong, Haodong, et al.
Published: (2025)