:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Yizhou, Cheng, Yihua, Wang, Kezhi
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2409.20364
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
by: Huang, Yizhou, et al.
Published: (2025)

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views
by: You, Junwei, et al.
Published: (2026)

Behavioral Cloning Models Reality Check for Autonomous Driving
by: Yildirim, Mustafa, et al.
Published: (2024)

DriveGPT: Scaling Autoregressive Behavior Models for Driving
by: Huang, Xin, et al.
Published: (2024)

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
by: Zhou, Enshen, et al.
Published: (2025)

NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices
by: Zhang, Zhiyong, et al.
Published: (2024)

SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
by: Zantout, Nader, et al.
Published: (2025)

DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving
by: Huang, Zilin, et al.
Published: (2026)

A Survey on Vision-Language-Action Models for Autonomous Driving
by: Jiang, Sicong, et al.
Published: (2025)

CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
by: Liao, Haicheng, et al.
Published: (2025)

TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
by: Han, Yi, et al.
Published: (2025)

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
by: Park, Sung-Yeon, et al.
Published: (2025)

Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent
by: He, Linfeng, et al.
Published: (2024)

Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation in Autonomous Driving Models
by: Mayumu, Nicanor, et al.
Published: (2026)

VERDI: VLM-Embedded Reasoning for Autonomous Driving
by: Feng, Bowen, et al.
Published: (2025)

Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving
by: Gao, Haoxiang, et al.
Published: (2025)

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving
by: Yang, Sheng, et al.
Published: (2025)

On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
by: Wu, Yiming, et al.
Published: (2025)

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving
by: Huang, Zilin, et al.
Published: (2024)

DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models
by: Song, Jingyu, et al.
Published: (2025)

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
by: Yang, Zhenjie, et al.
Published: (2025)

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)

Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
by: Zhang, Mike, et al.
Published: (2024)

SEAL: Vision-Language Model-Based Safe End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling
by: You, Junwei, et al.
Published: (2025)

Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction
by: Huang, Yizhou, et al.
Published: (2025)

CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models
by: Sheng, Zihao, et al.
Published: (2025)

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
by: Zhou, Yang, et al.
Published: (2026)

DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving
by: HU, Haibo, et al.
Published: (2025)

LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving
by: Shao, Hao, et al.
Published: (2026)

HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving
by: Wang, Yiru, et al.
Published: (2026)

FoSS: Modeling Long Range Dependencies and Multimodal Uncertainty in Trajectory Prediction via Fourier State Space Integration
by: Huang, Yizhou, et al.
Published: (2026)

Vega: Learning to Drive with Natural Language Instructions
by: Zuo, Sicheng, et al.
Published: (2026)

A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving
by: Long, Keke, et al.
Published: (2025)

DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving
by: Jiang, Anqing, et al.
Published: (2025)

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
by: Sha, Hao, et al.
Published: (2023)

A Language Agent for Autonomous Driving
by: Mao, Jiageng, et al.
Published: (2023)

VECTOR-Drive: Tightly Coupled Vision-Language and Trajectory Expert Routing for End-to-End Autonomous Driving
by: Zhao, Rui, et al.
Published: (2026)

DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving
by: Godbole, Mihir, et al.
Published: (2025)

LangCoop: Collaborative Driving with Language
by: Gao, Xiangbo, et al.
Published: (2025)

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
by: Zhou, Yang, et al.
Published: (2026)