Saved in:
| Main Authors: | Chang, Chun-Peng, Wang, Chen-Yu, Caesar, Holger, Pagani, Alain |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.09512 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation
by: Chang, Chun-Peng, et al.
Published: (2025)
by: Chang, Chun-Peng, et al.
Published: (2025)
Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction
by: Chang, Chun-Peng, et al.
Published: (2026)
by: Chang, Chun-Peng, et al.
Published: (2026)
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
by: Chang, Chun-Peng, et al.
Published: (2024)
by: Chang, Chun-Peng, et al.
Published: (2024)
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation
by: Chang, Chun-Peng, et al.
Published: (2024)
by: Chang, Chun-Peng, et al.
Published: (2024)
Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction
by: Wang, Shaoxiang, et al.
Published: (2024)
by: Wang, Shaoxiang, et al.
Published: (2024)
ICP-Flow: LiDAR Scene Flow Estimation with ICP
by: Lin, Yancong, et al.
Published: (2024)
by: Lin, Yancong, et al.
Published: (2024)
Offline Tracking with Object Permanence
by: Liu, Xianzhong, et al.
Published: (2023)
by: Liu, Xianzhong, et al.
Published: (2023)
TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding
by: Guo, Chaohong, et al.
Published: (2025)
by: Guo, Chaohong, et al.
Published: (2025)
How Auxiliary Reasoning Unleashes GUI Grounding in VLMs
by: Li, Weiming, et al.
Published: (2025)
by: Li, Weiming, et al.
Published: (2025)
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
BikeScenes: Online LiDAR Semantic Segmentation for Bicycles
by: Goren, Denniz, et al.
Published: (2025)
by: Goren, Denniz, et al.
Published: (2025)
Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
by: Lim, Jeongkee, et al.
Published: (2024)
by: Lim, Jeongkee, et al.
Published: (2024)
nuScenes Revisited: Progress and Challenges in Autonomous Driving
by: Fong, Whye Kit, et al.
Published: (2025)
by: Fong, Whye Kit, et al.
Published: (2025)
ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs
by: Li, Jiangyang, et al.
Published: (2026)
by: Li, Jiangyang, et al.
Published: (2026)
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
by: Xie, Shaoyuan, et al.
Published: (2025)
by: Xie, Shaoyuan, et al.
Published: (2025)
4DRC-OCC: Robust Semantic Occupancy Prediction Through Fusion of 4D Radar and Camera
by: Ninfa, David, et al.
Published: (2026)
by: Ninfa, David, et al.
Published: (2026)
4D-RaDiff: Latent Diffusion for 4D Radar Point Cloud Generation
by: Kwok, Jimmie, et al.
Published: (2025)
by: Kwok, Jimmie, et al.
Published: (2025)
BaSAL: Size-Balanced Warm Start Active Learning for LiDAR Semantic Segmentation
by: Wei, Jiarong, et al.
Published: (2023)
by: Wei, Jiarong, et al.
Published: (2023)
DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection
by: Fent, Felix, et al.
Published: (2024)
by: Fent, Felix, et al.
Published: (2024)
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
by: Zhou, Zijian, et al.
Published: (2023)
by: Zhou, Zijian, et al.
Published: (2023)
Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding
by: Zheng, Zelin, et al.
Published: (2026)
by: Zheng, Zelin, et al.
Published: (2026)
NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
by: Ljungbergh, William, et al.
Published: (2024)
by: Ljungbergh, William, et al.
Published: (2024)
Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)
by: Chen, Xin, et al.
Published: (2025)
Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
LeAP: Consistent multi-domain 3D labeling using Foundation Models
by: Gebraad, Simon, et al.
Published: (2025)
by: Gebraad, Simon, et al.
Published: (2025)
Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)
by: Lian, Weitong, et al.
Published: (2026)
Med-R2: An Adversarial Benchmark for Evidence-Grounded Reasoning in Medical VLMs
by: Ma, Wen, et al.
Published: (2026)
by: Ma, Wen, et al.
Published: (2026)
G3FA: Geometry-guided GAN for Face Animation
by: Javanmardi, Alireza, et al.
Published: (2024)
by: Javanmardi, Alireza, et al.
Published: (2024)
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
by: Zhou, Zijian, et al.
Published: (2024)
by: Zhou, Zijian, et al.
Published: (2024)
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
by: Zong, Yongshuo, et al.
Published: (2025)
by: Zong, Yongshuo, et al.
Published: (2025)
CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
by: Song, Rui, et al.
Published: (2025)
by: Song, Rui, et al.
Published: (2025)
A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs
by: Vaishnav, Mohit, et al.
Published: (2025)
by: Vaishnav, Mohit, et al.
Published: (2025)
ViFP: A Framework for Visual False Positive Detection to Enhance Reasoning Reliability in VLMs
by: Zhang, Ben, et al.
Published: (2025)
by: Zhang, Ben, et al.
Published: (2025)
AsyncBEV: Cross-modal Flow Alignment in Asynchronous 3D Object Detection
by: Wang, Shiming, et al.
Published: (2026)
by: Wang, Shiming, et al.
Published: (2026)
CLGRPO: Reasoning Ability Enhancement for Small VLMs
by: Wang, Fanyi, et al.
Published: (2025)
by: Wang, Fanyi, et al.
Published: (2025)
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
by: Xie, Yaxu, et al.
Published: (2024)
by: Xie, Yaxu, et al.
Published: (2024)
What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs
by: Lin, Jiaping, et al.
Published: (2026)
by: Lin, Jiaping, et al.
Published: (2026)
VLMs Guided Interpretable Decision Making for Autonomous Driving
by: Hu, Xin, et al.
Published: (2025)
by: Hu, Xin, et al.
Published: (2025)
UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities
by: Wang, Shiming, et al.
Published: (2023)
by: Wang, Shiming, et al.
Published: (2023)
Probing Visual Language Priors in VLMs
by: Luo, Tiange, et al.
Published: (2024)
by: Luo, Tiange, et al.
Published: (2024)
Similar Items
-
Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation
by: Chang, Chun-Peng, et al.
Published: (2025) -
Invaria: Learning Scale and Density Invariance in Point Clouds via Next-Resolution Prediction
by: Chang, Chun-Peng, et al.
Published: (2026) -
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
by: Chang, Chun-Peng, et al.
Published: (2024) -
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation
by: Chang, Chun-Peng, et al.
Published: (2024) -
Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction
by: Wang, Shaoxiang, et al.
Published: (2024)