:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zadeh, Danial Sadrian, Basir, Otman A., Moshiri, Behzad
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2601.14438
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Optimal Cascade Feature-Level Spatiotemporal Fusion Strategy for Anomaly Detection in CAN Bus
by: Fatahi, Mohammad, et al.
Published: (2025)

Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding
by: Elhenawy, Mohammed, et al.
Published: (2025)

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
by: De, Anik, et al.
Published: (2025)

Open World Scene Graph Generation using Vision Language Models
by: Dutta, Amartya, et al.
Published: (2025)

General Scene Adaptation for Vision-and-Language Navigation
by: Hong, Haodong, et al.
Published: (2025)

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
by: Jia, Baoxiong, et al.
Published: (2024)

DragTraffic: Interactive and Controllable Traffic Scene Generation for Autonomous Driving
by: Wang, Sheng, et al.
Published: (2024)

doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation
by: Roy, Parthib, et al.
Published: (2024)

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving
by: Gao, Haoxiang, et al.
Published: (2025)

Zero-Shot Scene Understanding with Multimodal Large Language Models for Automated Vehicles
by: Elhenawy, Mohammed, et al.
Published: (2025)

Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes
by: Agarwal, Nakul, et al.
Published: (2026)

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
by: Li, Yue, et al.
Published: (2025)

RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving
by: Zunair, Hasib, et al.
Published: (2024)

Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
by: Lohner, Aaron, et al.
Published: (2024)

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)

Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
by: Ma, Jingtian, et al.
Published: (2025)

X-Driver: Explainable Autonomous Driving with Vision-Language Models
by: Liu, Wei, et al.
Published: (2025)

MGNet: Monocular Geometric Scene Understanding for Autonomous Driving
by: Schön, Markus, et al.
Published: (2022)

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
by: Yang, Zongxin, et al.
Published: (2024)

StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
by: Oliveira, Daniel A. P., et al.
Published: (2025)

OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving
by: Liu, Pei, et al.
Published: (2025)

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
by: Schäfer, Finn Rasmus, et al.
Published: (2026)

Embodied Agents for Efficient Exploration and Smart Scene Description
by: Bigazzi, Roberto, et al.
Published: (2023)

NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
by: Kamali, Danial, et al.
Published: (2025)

DriveIndia: An Object Detection Dataset for Diverse Indian Traffic Scenes
by: Kumar, Rishav, et al.
Published: (2025)

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)

PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network
by: Wang, Yuning, et al.
Published: (2024)

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
by: Zheng, Lianqing, et al.
Published: (2024)

LLMs Behind the Scenes: Enabling Narrative Scene Illustration
by: Roemmele, Melissa, et al.
Published: (2025)

Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models
by: Mohamud, Safaa Abdullahi Moallim, et al.
Published: (2025)

T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
by: Lv, Changsheng, et al.
Published: (2024)

Pascal-Weighted Genetic Algorithms: A Binomially-Structured Recombination Framework
by: Basir, Otman A.
Published: (2025)

ScenePilot-4K: A Large-Scale First-Person Dataset and Benchmark for Vision-Language Models in Autonomous Driving
by: Wang, Yujin, et al.
Published: (2026)

SIMSplat: Predictive Driving Scene Editing with Language-aligned 4D Gaussian Splatting
by: Park, Sung-Yeon, et al.
Published: (2025)

A Study of the Framework and Real-World Applications of Language Embedding for 3D Scene Understanding
by: Zaouali, Mahmoud Chick, et al.
Published: (2025)

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes
by: Wang, Chuhan, et al.
Published: (2026)

Scenario Understanding of Traffic Scenes Through Large Visual Language Models
by: Rivera, Esteban, et al.
Published: (2025)

Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving
by: Zhou, Hongkuan, et al.
Published: (2025)

The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge
by: Peng, Jinghan, et al.
Published: (2025)