:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chang, Chun-Peng, Wang, Chen-Yu, Caesar, Holger, Pagani, Alain
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.09512
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation
by: Chang, Chun-Peng, et al.
Published: (2025)

Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction
by: Chang, Chun-Peng, et al.
Published: (2026)

MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
by: Chang, Chun-Peng, et al.
Published: (2024)

3D Spatial Understanding in MLLMs: Disambiguation and Evaluation
by: Chang, Chun-Peng, et al.
Published: (2024)

Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction
by: Wang, Shaoxiang, et al.
Published: (2024)

ICP-Flow: LiDAR Scene Flow Estimation with ICP
by: Lin, Yancong, et al.
Published: (2024)

Offline Tracking with Object Permanence
by: Liu, Xianzhong, et al.
Published: (2023)

TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding
by: Guo, Chaohong, et al.
Published: (2025)

How Auxiliary Reasoning Unleashes GUI Grounding in VLMs
by: Li, Weiming, et al.
Published: (2025)

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
by: Jiang, Bo, et al.
Published: (2025)

BikeScenes: Online LiDAR Semantic Segmentation for Bicycles
by: Goren, Denniz, et al.
Published: (2025)

Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
by: Lim, Jeongkee, et al.
Published: (2024)

nuScenes Revisited: Progress and Challenges in Autonomous Driving
by: Fong, Whye Kit, et al.
Published: (2025)

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs
by: Li, Jiangyang, et al.
Published: (2026)

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
by: Xie, Shaoyuan, et al.
Published: (2025)

4DRC-OCC: Robust Semantic Occupancy Prediction Through Fusion of 4D Radar and Camera
by: Ninfa, David, et al.
Published: (2026)

4D-RaDiff: Latent Diffusion for 4D Radar Point Cloud Generation
by: Kwok, Jimmie, et al.
Published: (2025)

BaSAL: Size-Balanced Warm Start Active Learning for LiDAR Semantic Segmentation
by: Wei, Jiarong, et al.
Published: (2023)

DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection
by: Fent, Felix, et al.
Published: (2024)

VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
by: Zhou, Zijian, et al.
Published: (2023)

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)

NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
by: Ljungbergh, William, et al.
Published: (2024)

Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)

Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
by: Li, Yue, et al.
Published: (2025)

LeAP: Consistent multi-domain 3D labeling using Foundation Models
by: Gebraad, Simon, et al.
Published: (2025)

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)

Med-R2: An Adversarial Benchmark for Evidence-Grounded Reasoning in Medical VLMs
by: Ma, Wen, et al.
Published: (2026)

G3FA: Geometry-guided GAN for Face Animation
by: Javanmardi, Alireza, et al.
Published: (2024)

OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
by: Zhou, Zijian, et al.
Published: (2024)

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
by: Zong, Yongshuo, et al.
Published: (2025)

CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
by: Song, Rui, et al.
Published: (2025)

A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs
by: Vaishnav, Mohit, et al.
Published: (2025)

ViFP: A Framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs
by: Zhang, Ben, et al.
Published: (2025)

AsyncBEV: Cross-modal Flow Alignment in Asynchronous 3D Object Detection
by: Wang, Shiming, et al.
Published: (2026)

CLGRPO: Reasoning Ability Enhancement for Small VLMs
by: Wang, Fanyi, et al.
Published: (2025)

SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
by: Xie, Yaxu, et al.
Published: (2024)

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs
by: Lin, Jiaping, et al.
Published: (2026)

VLMs Guided Interpretable Decision Making for Autonomous Driving
by: Hu, Xin, et al.
Published: (2025)

UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities
by: Wang, Shiming, et al.
Published: (2023)

Probing Visual Language Priors in VLMs
by: Luo, Tiange, et al.
Published: (2024)